hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-12-04 13:54:05 +00:00

Author	SHA1	Message	Date
Ari Lemmetti	146298a0df	New AVX2 block averaging WIP missing small chroma block and SMP/AMP	2021-11-08 23:01:13 +02:00
Ari Lemmetti	ef69c65c58	New bipred average functions	2021-11-08 23:01:12 +02:00
Ari Lemmetti	f47bd5d86f	Rename some bipred functions	2021-11-08 23:01:12 +02:00
Ari Lemmetti	b52a930bed	About working with generics	2021-11-08 23:01:12 +02:00
Ari Lemmetti	e7857cbb24	Remove avx2 blending	2021-11-08 22:45:45 +02:00
Marko Viitanen	4a42b5cbc4	[cleanup] Remove HMVP debug code and extra arrays in intra coding	2021-11-08 10:11:17 +02:00
Marko Viitanen	73c4128100	[quant] Map scalinglistType correctly	2021-10-29 09:10:15 +03:00
Marko Viitanen	492d22e8be	Disable interpolation AVX2 optimizations for now	2021-10-29 08:43:52 +03:00
Marko Viitanen	57883369ca	Change all the license texts in source headers and LICENSE file to 3-clause BSD, closes #302 * All now have the same exact text string	2021-10-13 15:22:46 +03:00
Ari Lemmetti	171b9c60b3	[SIMD] Convert planar and DC mode PDPC loops to AVX2	2021-09-08 03:40:38 +03:00
Ari Lemmetti	ad35d4a4c8	[SIMD] Loop transformation, prepare data for latter loop	2021-09-06 22:38:37 +03:00
Ari Lemmetti	22da8cfe65	[SIMD] Loop transformations for SIMD processing	2021-09-06 22:30:36 +03:00
Ari Lemmetti	c195d906d3	[SIMD] Copy generic implementation of planar/DC PDPC as a skeleton	2021-09-06 21:20:51 +03:00
Ari Lemmetti	c6b33c7b92	[SIMD] Move PDPC condition out of strategy	2021-09-06 21:20:51 +03:00
Ari Lemmetti	46cf9b6871	[SIMD] Make strategy out of PDPC for planar and DC	2021-09-06 21:20:51 +03:00
Ari Lemmetti	816e7a5a91	[SIMD] Replace PDPC remainder loop with masking operations	2021-09-06 21:20:51 +03:00
Ari Lemmetti	1926b4cc27	[SIMD] Initial AVX2 code for transpose in angular prediction	2021-09-06 21:20:50 +03:00
Ari Lemmetti	913573baca	[SIMD] Initial AVX2 code for PDPC in angular prediction	2021-09-06 21:20:50 +03:00
Ari Lemmetti	7ccd1a571c	[SIMD] Initial AVX2 code for 4-tap filtering in angular prediction.	2021-09-06 21:20:50 +03:00
Ari Lemmetti	20f0ff976d	[SIMD] Transform angular pred loops for SIMD processing.	2021-09-06 21:20:49 +03:00
Ari Lemmetti	3dfe09e850	[SIMD] Copy generic implementation of angular prediction as a skeleton.	2021-09-06 21:20:46 +03:00
Joose Sainio	450cbd356c	Merge branch 'joint_cbcr' into 'master' [jccr] Add joint coding of chroma residual See merge request cs/ultravideo/vvc/uvg266!6	2021-09-06 11:43:06 +03:00
Joose Sainio	0592cc65a0	[jccr] enable rdoq with jccr	2021-09-06 11:28:20 +03:00
Joose Sainio	072b84711a	[jccr] fix 64×64 CUs	2021-09-06 11:28:20 +03:00
Joose Sainio	042b5078d8	[jccr] WIP initial implementation Add somekind of search for joint chroma residual coding. Bitstream is currently correct but prediction is incorrect because the jccr is actually not used in the search. Hard coded to be enabled	2021-09-06 11:28:08 +03:00
Marko Viitanen	26f18865f7	[alf] Change the processing in alf_get_blk_stats_avx2() to allow utilizing the whole 256bit register	2021-08-27 13:40:28 +03:00
Marko Viitanen	fdf125f406	[alf] Fix incorrect conversion in alf_get_blk_stats_avx2	2021-08-27 10:25:20 +03:00
Marko Viitanen	6714973264	[alf] Change _mm_store_si128 to _mm_storeu_si128 in alf_get_blk_stats_avx2()	2021-08-26 18:05:06 +03:00
Marko Viitanen	5df8add046	[alf] Change order of alf_covariance.y array for better AVX2 optimization in alf_get_blk_stats_avx2()	2021-08-26 15:37:01 +03:00
Marko Viitanen	be9527cf1d	[alf] Change the order of alf_covariance.ee values to get better optimized solution for alf_get_blk_stats_avx2()	2021-08-26 11:07:13 +03:00
Marko Viitanen	f4de5cfd0f	[alf] Cleanup alf_calc_covariance_avx2() and use integers in alf_get_blk_stats_avx2()	2021-08-26 10:20:57 +03:00
Marko Viitanen	915bf3ca24	[alf] Fix AVX2 priority	2021-08-25 20:29:58 +03:00
Marko Viitanen	8ef3e6a126	[alf] Add strategy for alf_get_blk_stats() and an initial AVX2 version	2021-08-25 20:22:24 +03:00
Marko Viitanen	f61b9138cd	[alf] Import SSE4.1 optimized 5x5 and 7x7 filters from VTM13 * Modified to work with 8-bit pixels	2021-08-25 11:50:37 +03:00
Marko Viitanen	dc6a29b0d8	[alf] Initial generic strategies for 5x5 and 7x7 filtering	2021-08-25 10:50:00 +03:00
Marko Viitanen	c3c96d69c2	[alf] Add modified alf_derive_classification_blk_sse41() from VTM 13.0 * Modified to work with bitdepth 8	2021-08-20 11:45:02 +03:00
Marko Viitanen	b158d05bca	[alf] rename strategy function to include prefix	2021-08-19 17:19:17 +03:00
Marko Viitanen	3efaeede76	[alf] Define the strategy for alf_derive_classification_blk()	2021-08-19 17:04:35 +03:00
Marko Viitanen	d742f57779	Remove angular_pred_avx2 so we don't need extra parameter	2021-08-15 10:43:48 +03:00
Marko Viitanen	5604b6f946	[cleanup] remove all crypto related stuff, fix warnings, move estimate.m to tools/	2021-07-27 09:27:51 +03:00
Marko Viitanen	99a2b0384d	[cleanup] remove some warnings	2021-07-26 11:42:19 +03:00
Marko Viitanen	0cad1ac3c9	[mts] Add a comment about idct8/idst7 16x16 being unoptimized	2021-07-21 14:02:23 +03:00
Marko Viitanen	d5ef036d35	[mts] change mts_subset tables back to static	2021-07-21 13:54:59 +03:00
Marko Viitanen	60caf2c378	[mts] fix 32x32 idst/idct	2021-07-21 13:44:25 +03:00
Marko Viitanen	c2cd5fb98e	[mts] replace AVX2 DST7/DCT8 16x16 with unoptimized for now	2021-07-21 13:38:17 +03:00
Marko Viitanen	7e089f518d	[mts] add optimized versions of DCT8 and DST7, inverse not yet working properly * Includes new unit tests for the mts	2021-07-21 11:53:15 +03:00
Marko Viitanen	7f67009511	Fix MD5 calculations from HEVC to VVC way	2021-06-24 15:03:29 +03:00
Marko Viitanen	c004735821	[LMCS] Fix casting of the chroma scaled residual	2021-06-18 09:35:06 +03:00
Joose Sainio	cfffd7166c	Use correct context for calculating coeff costs for transform skip	2021-06-07 13:06:03 +03:00
Marko Viitanen	4594bf0ca8	Merge branch 'lmcs_chroma'	2021-06-02 15:05:04 +03:00
Marko Viitanen	5babb14ee7	[LMCS] Use chroma scaling	2021-06-01 12:17:03 +03:00
Joose Sainio	f9de8ebc4f	Merge branch 'master' into '4x4-rd' # Conflicts: # src/encoder.c # tests/test_intra.sh	2021-05-28 11:43:55 +00:00
Marko Viitanen	dbc7fd48bf	[LMCS] Initialize some m_reshapeCW values to avoid division by zero	2021-05-24 18:57:37 +03:00
Marko Viitanen	73ac3b68bf	[LMCS] add missing header in quant-avx2.c	2021-05-24 17:25:38 +03:00
Marko Viitanen	4cd5bc38a1	[LMCS] Luma mapping working after some rework, have to keep the reconstruction in the mapped domain	2021-05-24 17:23:17 +03:00
Joose Sainio	cfd7d2666b	slightly optimize intra-generic.c	2021-05-14 10:23:37 +03:00
Joose Sainio	7674e94fd1	[rdoq] transform skip RDOQ Copy the implementation from VTM	2021-05-03 12:52:10 +03:00
Joose Sainio	d2b9893bb7	[transform skip] Fix misunderstanding that caused TS to use QP 52>=	2021-04-30 10:55:23 +03:00
Joose Sainio	a998f3ed74	[transform-skip] Convert the HEVC transfrom skip to VVC For some reason transform skip uses QP MAX(52, QP) and the coeffs are no longer shifted	2021-04-30 10:55:23 +03:00
Joose Sainio	2ab005692d	Enable 4x4 intra CUs	2021-04-23 10:57:29 +03:00
Joose Sainio	1aaa95601c	Merge remote-tracking branch 'remotes/kvz_github/master' into Fix-monochrome # Conflicts: # .gitlab-ci.yml # build/kvazaar_lib/kvazaar_lib.vcxproj.filters # src/cfg.c # src/encoder.h # src/kvazaar.h # src/rdo.c	2021-04-23 10:56:50 +03:00
Joose Sainio	e8eab326fb	Update context selection to match VVC	2021-04-23 10:51:01 +03:00
Joose Sainio	b2076d3b39	Enable chroma scaling WIP: user defined scaling array	2021-03-16 10:31:26 +02:00
Joose Sainio	412781db41	[scalinglist] Fix quant-generic	2021-03-09 10:42:40 +02:00
Joose Sainio	30e573c261	[scalinglist] WIP: Update scalinglist for VVC Seems to work when rdoq is enabled but not when it is disabled	2021-03-09 09:51:49 +02:00
Ari Lemmetti	dad3d6818e	Only read left and right border pixels if necessary	2021-03-08 22:36:10 +02:00
Ari Lemmetti	b72ab583b4	Handle "don't care" rows in the end separately	2021-03-08 22:36:09 +02:00
Ari Lemmetti	33295bf350	Use AVX2 luma interpolation for SMP and AMP as well	2021-03-08 22:36:09 +02:00
Ari Lemmetti	7ce68761c2	Add a reminder to fix a rare case for bipred	2021-03-08 22:36:09 +02:00
Ari Lemmetti	475f1d79d5	Add some defines for important interpolation related sizes	2021-03-08 22:36:09 +02:00
Ari Lemmetti	4314f3a9a7	Rename some interpolation functions and strategies for consistency	2021-03-08 22:36:08 +02:00
Ari Lemmetti	5a70b49f69	Require 64-bit build for AVX2 interpolation filter functions	2021-03-08 22:36:08 +02:00
Ari Lemmetti	5631651469	Remove unused functions and variables	2021-03-08 22:36:08 +02:00
Ari Lemmetti	e38219e489	Fix epol_func signature and function definition	2021-03-08 22:36:07 +02:00
Ari Lemmetti	7e6ba9750f	Add new AVX2 ip filters for chroma	2021-03-08 22:36:07 +02:00
Ari Lemmetti	3476fc62c7	Fix parameter to signed	2021-03-08 22:36:06 +02:00
Ari Lemmetti	e572066e46	Add new AVX2 vertical ip filter for pixel precision	2021-03-08 22:36:06 +02:00
Ari Lemmetti	9e4b62a891	Use the new horizontal filter for pixel precision as well	2021-03-08 22:36:06 +02:00
Ari Lemmetti	2175023843	Relocate function	2021-03-08 22:36:06 +02:00
Ari Lemmetti	f5b0e3c52b	Add new AVX2 horizontal ip filter capable of every luma PB	2021-03-08 22:36:05 +02:00
Ari Lemmetti	d9a3225ae5	Add new AVX2 vertical ip filter for high-precision	2021-03-08 22:36:05 +02:00
Ari Lemmetti	84222cf3e7	Replace old block extrapolation with more capable one. Separate paddings for different directions can be now specified.	2021-03-08 22:36:04 +02:00
Marko Viitanen	e05dcdb193	Enable sign hiding in quant_avx2 and fix a bug in kvz_encode_coeff_nxn_generic()	2021-02-12 16:40:28 +02:00
Marko Viitanen	79c36f6aeb	Enable RDOQ and sign hiding	2021-02-12 13:24:02 +02:00
Arttu Makinen	7098a94a6f	Implemented implicit MTS. Added selection of implicit MTS to command parameters. Updated the transform selection to support implicit MTS.	2021-02-11 15:11:15 +02:00
Arttu Mäkinen	8f34685a8f	Merge branch 'master' into 'mts' # Conflicts: # src/cfg.c # src/kvazaar.h	2021-02-10 13:05:18 +02:00
Arttu Makinen	c5570abe1b	Removed 'emt' variable from cu_info_t and changed 'emt' globally to 'mts' for consistency.	2021-02-10 12:08:05 +02:00
Arttu Makinen	d0b7dd95f7	MTS works on intra mode. Fixed usage of MTS constraints. Fixed DCT8 transforms. Added sorting function of MTS modes with intra modes and costs to search.c.	2021-02-10 11:01:58 +02:00
Arttu Makinen	2e7c342645	Implemented DCT2, DST7, and DCT8 transforms, and search for selecting transform for MTS. Using MTS results mismatch for luma component.	2021-02-02 11:09:43 +02:00
Arttu Makinen	b9c3336f0e	MTS bitstream encoding added for intra. Work with depths 0-3.	2021-01-18 20:44:36 +02:00
Arttu Makinen	98a8e78e93	avx2/encode_coding_tree-avx2.c update, because it caused errors	2020-12-30 14:25:16 +02:00
Pauli Oikkonen	816789c9f4	Allow fast coeff weights to be read from a file	2020-10-29 15:22:51 +02:00
Pauli Oikkonen	6799019db0	Move fast coeff table to transform.h Guess this is a more logical place for it	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	4712ce5f59	Round the fast coeff result instead of flooring	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	0fb09c9920	New filtered coeff weight by QP values	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	24d487f553	New weights for 12 <= QP <= 42 Trained using MSU ultrafast settings now	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	3e1c6d84b8	Fix issues in fast coeff estimation Allow weight table to start from nonzero QP, and round weights to Q8.8 instead of flooring them	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	5f91bda762	Use newer data for fast coeff cost estimation Same training dataset, but this time only buckets 0...3 were used to approximate the function, no sign/cg width bucket.	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	2abd733199	Use unsigned min() to correctly clip -32768 If a coeff happens to be -32768 (0x8000), its 16-bit abs() is also 0x8000. It should ultimately be clipped to 3, so interpret absolute values as unsigned instead to make that happen.	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	b93b90c0d7	Implement new fast coeff cost estimator in AVX2	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	2f74a112b3	Try first lookup table based fast coeff estimation	2020-10-29 15:20:27 +02:00
Marko Viitanen	2db3a07b14	Prevent cu_sig_model_chroma array from being indexed over the limit	2020-10-13 14:14:57 +03:00
Marko Viitanen	bddfb47a55	Merge remote-tracking branch 'remotes/kvazaar_github/master'	2020-09-25 11:49:11 +03:00
Marko Viitanen	449975b0fb	Fixed cubic filter usage in intra angular modes	2020-09-21 14:58:34 +03:00
Pauli Oikkonen	780da4568a	Exclude 8-bit-only code from 10-bit builds and use uint8_t instead of kvz_pixel for code that assumes 8-bit pixels	2020-09-02 17:46:33 +03:00
Marko Viitanen	574c4d06ee	Fix use of log2_cg_size in coeff coding -> smaller blocks also decoded correctly	2020-08-27 18:26:16 +03:00
Marko Viitanen	20b66c9949	Sync to VTM 8.2 and add separate height to last_sig coding	2020-04-29 08:52:38 +03:00
Jan Beich	1fa69c705d	Rename truncate() from `30ce461d98` to avoid conflict with POSIX version strategies/avx2/dct-avx2.c:55:23: error: static declaration of 'truncate' follows non-static declaration static INLINE __m256i truncate(__m256i v, __m256i debias, int32_t shift) ^ /usr/include/stdio.h:448:6: note: previous declaration is here int truncate(const char *, __off_t); ^	2020-04-22 16:09:42 +00:00
Marko Viitanen	86d76b19a4	Fix intra neighboring block selection and clean some unused code	2020-04-16 14:12:40 +03:00
Ari Lemmetti	f31dddc019	Bypass inverse quantization and inverse transform when trying early skip	2020-04-10 16:02:09 +03:00
Pauli Oikkonen	8617530b13	Use _mm_store_epi64 instead of _mm_cvtsi128_si64 Fix 32-bit builds that tend to lack the cvt intrinsic. Hope it will be optimized to a movq r64, xmm on modern platforms though	2020-04-07 23:51:54 +03:00
Pauli Oikkonen	a82966c0f5	Fix lacking _mm256_cvtss_f32 intrinsic on VS Cast __m256 into __m128 first, the XMM variant of the intrinsic has been around for a long enough time to be supported	2020-04-07 22:38:10 +03:00
Ari Lemmetti	901c25c0c8	Merge branch 'vaq'	2020-04-03 19:51:17 +03:00
Ari Lemmetti	51451be5ef	Handle cases where the number of pixels is not divisible by 32	2020-04-03 19:37:47 +03:00
siivonek	e5267f7706	Fix define for use with Visual Studio.	2020-04-03 15:11:01 +02:00
Pauli Oikkonen	addc1c3ede	Fix warning about potentially unused hsum_8x32b There's a lot of alternative options available, such as making it globally visible with a kvz_ prefix, force inlining it, or anything. This could be good too, hope it won't be compiled at all to translation units where it's not used.	2020-04-02 16:44:22 +03:00
siivonek	566680af7b	Move function hsum to file where it is used to avoid errors.	2020-04-02 14:03:06 +02:00
siivonek	58be514e2a	Fix pipeline error.	2020-04-02 13:50:08 +02:00
Pauli Oikkonen	99889dab15	Fix switch(bool) in picture-avx2.c It passes on GCC but warns on Clang	2020-03-31 15:42:19 +03:00
Jaakko Laitinen	af3d559d8d	Let pu-depth be defined per gop-layer	2020-03-17 17:57:18 +02:00
Pauli Oikkonen	60e7956dc5	Disable inaccurate integer variance calculation for now	2020-03-02 19:18:55 +02:00
Pauli Oikkonen	fc1b91335b	Implement variance calculation in integer math Maybe this is a bit faster than FP, it's not accurate though	2020-03-02 18:17:18 +02:00
Pauli Oikkonen	35c825c75f	Move hsum_8x32b to avx2_common_functions	2020-02-27 17:52:17 +02:00
Pauli Oikkonen	b00ac7d1c4	AVX2 version of buffer variance calculation	2020-02-25 15:57:56 +02:00
Pauli Oikkonen	1bd9c6dd93	Make a strategy out of pixel_var	2020-02-24 19:37:36 +02:00
Ari Lemmetti	3c7dd0752f	Remove the broken "no mov" branch. Causes hash mismatches for example in SlideShow sequence.	2020-02-03 15:26:31 +02:00
RLamm	30d5df40c5	Custom headers for the distributed coding	2020-01-29 15:54:49 +02:00
Pauli Oikkonen	c3d9e97e9f	Fix VS build	2019-12-12 18:34:55 +02:00
Pauli Oikkonen	7f238ca299	Remove debug print functions Whoops	2019-12-12 18:19:31 +02:00
Pauli Oikkonen	eefb5e50b3	De-inline pred_filtered_dc functions, shouldn't make much difference though	2019-12-12 17:30:00 +02:00
Pauli Oikkonen	169314de4f	32x32 filtered DC prediction in AVX2	2019-12-11 18:17:06 +02:00
Pauli Oikkonen	fb2481b7e4	16x16 filtered DC implemented in AVX2	2019-12-10 15:54:50 +02:00
Pauli Oikkonen	da370ea36d	Implement AVX2 8x8 filtered DC algorithm	2019-11-28 14:10:10 +02:00
Pauli Oikkonen	5d9b7019ca	Implement a 4x4 filtered DC pred function	2019-11-26 17:05:54 +02:00
Pauli Oikkonen	f1485ab087	Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?	2019-11-25 15:20:29 +02:00
Marko Viitanen	eb2caf9118	Fix intra angle filter, changed from gauss filter table to run-time calculated 4-tap filter	2019-11-19 15:15:21 +02:00
Pauli Oikkonen	979d66031c	Create a strategy out of intra_pred_filtered_dc	2019-11-19 14:50:31 +02:00
Marko Viitanen	466d8772b0	Apply JVET_P0170_ZERO_POS_SIMPLIFICATION in coeff bypass coding	2019-11-19 14:32:38 +02:00
Pauli Oikkonen	fa4bb86406	Optimize intra_pred_planar_avx2 for 4x4 blocks	2019-11-19 13:39:02 +02:00
Marko Viitanen	17a53230fd	Code cleanup, remove unused arrays and remove tabs	2019-11-18 09:01:23 +02:00
Pauli Oikkonen	4761d228f9	Start to vectorize the 4x4 loop	2019-11-15 17:32:40 +02:00
Pauli Oikkonen	8d45ab4951	Stupidify the 4x4 planar loop for vectorization	2019-11-14 17:14:04 +02:00
Pauli Oikkonen	6d7a4f555c	Also remove 16x16 (A * B^T)^T matrix multiply Can be done using (B * A^T) instead, it's the exact same	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	2c2deb2366	Tidy AVX2 32x32 matrix multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	98ad78b333	Tidy the old AVX2 32x32 matrix multiply It was actually a very good algorithm, just looked messy!	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	4a921cbdb5	Retain data as much in YMM registers as possible This seems to make it a whole lot quicker	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ac4d710e23	Unroll 32x32 matrix multiply, use all regs	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	a58608d0b8	Remove totally unnecessary (A * B^T)^T 32x32 multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	043f53539f	Implement a streamlined matrix-multiply 32x32 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e9da2d851b	Tidy 32x32 fast DCT's helper functions	2019-10-28 16:19:42 +02:00

1 2 3 4 5 ...

692 commits