Commit graph

624 commits

Author SHA1 Message Date
Ari Lemmetti 7ccd1a571c [SIMD] Initial AVX2 code for 4-tap filtering in angular prediction. 2021-09-06 21:20:50 +03:00
Ari Lemmetti 20f0ff976d [SIMD] Transform angular pred loops for SIMD processing. 2021-09-06 21:20:49 +03:00
Ari Lemmetti 3dfe09e850 [SIMD] Copy generic implementation of angular prediction as a skeleton. 2021-09-06 21:20:46 +03:00
Joose Sainio 450cbd356c Merge branch 'joint_cbcr' into 'master'
[jccr] Add joint coding of chroma residual

See merge request cs/ultravideo/vvc/uvg266!6
2021-09-06 11:43:06 +03:00
Joose Sainio 0592cc65a0 [jccr] enable rdoq with jccr 2021-09-06 11:28:20 +03:00
Joose Sainio 072b84711a [jccr] fix 64×64 CUs 2021-09-06 11:28:20 +03:00
Joose Sainio 042b5078d8 [jccr] WIP initial implementation
Add somekind of search for joint chroma residual coding.
Bitstream is currently correct but prediction is incorrect because the jccr
is actually not used in the search.

Hard coded to be enabled
2021-09-06 11:28:08 +03:00
Marko Viitanen 26f18865f7 [alf] Change the processing in alf_get_blk_stats_avx2() to allow utilizing the whole 256bit register 2021-08-27 13:40:28 +03:00
Marko Viitanen fdf125f406 [alf] Fix incorrect conversion in alf_get_blk_stats_avx2 2021-08-27 10:25:20 +03:00
Marko Viitanen 6714973264 [alf] Change _mm_store_si128 to _mm_storeu_si128 in alf_get_blk_stats_avx2() 2021-08-26 18:05:06 +03:00
Marko Viitanen 5df8add046 [alf] Change order of alf_covariance.y array for better AVX2 optimization in alf_get_blk_stats_avx2() 2021-08-26 15:37:01 +03:00
Marko Viitanen be9527cf1d [alf] Change the order of alf_covariance.ee values to get better optimized solution for alf_get_blk_stats_avx2() 2021-08-26 11:07:13 +03:00
Marko Viitanen f4de5cfd0f [alf] Cleanup alf_calc_covariance_avx2() and use integers in alf_get_blk_stats_avx2() 2021-08-26 10:20:57 +03:00
Marko Viitanen 915bf3ca24 [alf] Fix AVX2 priority 2021-08-25 20:29:58 +03:00
Marko Viitanen 8ef3e6a126 [alf] Add strategy for alf_get_blk_stats() and an initial AVX2 version 2021-08-25 20:22:24 +03:00
Marko Viitanen f61b9138cd [alf] Import SSE4.1 optimized 5x5 and 7x7 filters from VTM13
* Modified to work with 8-bit pixels
2021-08-25 11:50:37 +03:00
Marko Viitanen dc6a29b0d8 [alf] Initial generic strategies for 5x5 and 7x7 filtering 2021-08-25 10:50:00 +03:00
Marko Viitanen c3c96d69c2 [alf] Add modified alf_derive_classification_blk_sse41() from VTM 13.0
* Modified to work with bitdepth 8
2021-08-20 11:45:02 +03:00
Marko Viitanen b158d05bca [alf] rename strategy function to include prefix 2021-08-19 17:19:17 +03:00
Marko Viitanen 3efaeede76 [alf] Define the strategy for alf_derive_classification_blk() 2021-08-19 17:04:35 +03:00
Marko Viitanen d742f57779 Remove angular_pred_avx2 so we don't need extra parameter 2021-08-15 10:43:48 +03:00
Marko Viitanen 5604b6f946 [cleanup] remove all crypto related stuff, fix warnings, move estimate.m to tools/ 2021-07-27 09:27:51 +03:00
Marko Viitanen 99a2b0384d [cleanup] remove some warnings 2021-07-26 11:42:19 +03:00
Marko Viitanen 0cad1ac3c9 [mts] Add a comment about idct8/idst7 16x16 being unoptimized 2021-07-21 14:02:23 +03:00
Marko Viitanen d5ef036d35 [mts] change mts_subset tables back to static 2021-07-21 13:54:59 +03:00
Marko Viitanen 60caf2c378 [mts] fix 32x32 idst/idct 2021-07-21 13:44:25 +03:00
Marko Viitanen c2cd5fb98e [mts] replace AVX2 DST7/DCT8 16x16 with unoptimized for now 2021-07-21 13:38:17 +03:00
Marko Viitanen 7e089f518d [mts] add optimized versions of DCT8 and DST7, inverse not yet working properly
* Includes new unit tests for the mts
2021-07-21 11:53:15 +03:00
Marko Viitanen 7f67009511 Fix MD5 calculations from HEVC to VVC way 2021-06-24 15:03:29 +03:00
Marko Viitanen c004735821 [LMCS] Fix casting of the chroma scaled residual 2021-06-18 09:35:06 +03:00
Joose Sainio cfffd7166c Use correct context for calculating coeff costs for transform skip 2021-06-07 13:06:03 +03:00
Marko Viitanen 4594bf0ca8 Merge branch 'lmcs_chroma' 2021-06-02 15:05:04 +03:00
Marko Viitanen 5babb14ee7 [LMCS] Use chroma scaling 2021-06-01 12:17:03 +03:00
Joose Sainio f9de8ebc4f Merge branch 'master' into '4x4-rd'
# Conflicts:
#   src/encoder.c
#   tests/test_intra.sh
2021-05-28 11:43:55 +00:00
Marko Viitanen dbc7fd48bf [LMCS] Initialize some m_reshapeCW values to avoid division by zero 2021-05-24 18:57:37 +03:00
Marko Viitanen 73ac3b68bf [LMCS] add missing header in quant-avx2.c 2021-05-24 17:25:38 +03:00
Marko Viitanen 4cd5bc38a1 [LMCS] Luma mapping working after some rework, have to keep the reconstruction in the mapped domain 2021-05-24 17:23:17 +03:00
Joose Sainio cfd7d2666b slightly optimize intra-generic.c 2021-05-14 10:23:37 +03:00
Joose Sainio 7674e94fd1 [rdoq] transform skip RDOQ
Copy the implementation from VTM
2021-05-03 12:52:10 +03:00
Joose Sainio d2b9893bb7 [transform skip] Fix misunderstanding that caused TS to use QP 52>= 2021-04-30 10:55:23 +03:00
Joose Sainio a998f3ed74 [transform-skip] Convert the HEVC transfrom skip to VVC
For some reason transform skip uses QP MAX(52, QP) and the coeffs are
no longer shifted
2021-04-30 10:55:23 +03:00
Joose Sainio 2ab005692d Enable 4x4 intra CUs 2021-04-23 10:57:29 +03:00
Joose Sainio 1aaa95601c Merge remote-tracking branch 'remotes/kvz_github/master' into Fix-monochrome
# Conflicts:
#	.gitlab-ci.yml
#	build/kvazaar_lib/kvazaar_lib.vcxproj.filters
#	src/cfg.c
#	src/encoder.h
#	src/kvazaar.h
#	src/rdo.c
2021-04-23 10:56:50 +03:00
Joose Sainio e8eab326fb Update context selection to match VVC 2021-04-23 10:51:01 +03:00
Joose Sainio b2076d3b39 Enable chroma scaling
WIP: user defined scaling array
2021-03-16 10:31:26 +02:00
Joose Sainio 412781db41 [scalinglist] Fix quant-generic 2021-03-09 10:42:40 +02:00
Joose Sainio 30e573c261 [scalinglist] WIP: Update scalinglist for VVC
Seems to work when rdoq is enabled but not when it is disabled
2021-03-09 09:51:49 +02:00
Ari Lemmetti dad3d6818e Only read left and right border pixels if necessary 2021-03-08 22:36:10 +02:00
Ari Lemmetti b72ab583b4 Handle "don't care" rows in the end separately 2021-03-08 22:36:09 +02:00
Ari Lemmetti 33295bf350 Use AVX2 luma interpolation for SMP and AMP as well 2021-03-08 22:36:09 +02:00
Ari Lemmetti 7ce68761c2 Add a reminder to fix a rare case for bipred 2021-03-08 22:36:09 +02:00
Ari Lemmetti 475f1d79d5 Add some defines for important interpolation related sizes 2021-03-08 22:36:09 +02:00
Ari Lemmetti 4314f3a9a7 Rename some interpolation functions and strategies for consistency 2021-03-08 22:36:08 +02:00
Ari Lemmetti 5a70b49f69 Require 64-bit build for AVX2 interpolation filter functions 2021-03-08 22:36:08 +02:00
Ari Lemmetti 5631651469 Remove unused functions and variables 2021-03-08 22:36:08 +02:00
Ari Lemmetti e38219e489 Fix epol_func signature and function definition 2021-03-08 22:36:07 +02:00
Ari Lemmetti 7e6ba9750f Add new AVX2 ip filters for chroma 2021-03-08 22:36:07 +02:00
Ari Lemmetti 3476fc62c7 Fix parameter to signed 2021-03-08 22:36:06 +02:00
Ari Lemmetti e572066e46 Add new AVX2 vertical ip filter for pixel precision 2021-03-08 22:36:06 +02:00
Ari Lemmetti 9e4b62a891 Use the new horizontal filter for pixel precision as well 2021-03-08 22:36:06 +02:00
Ari Lemmetti 2175023843 Relocate function 2021-03-08 22:36:06 +02:00
Ari Lemmetti f5b0e3c52b Add new AVX2 horizontal ip filter capable of every luma PB 2021-03-08 22:36:05 +02:00
Ari Lemmetti d9a3225ae5 Add new AVX2 vertical ip filter for high-precision 2021-03-08 22:36:05 +02:00
Ari Lemmetti 84222cf3e7 Replace old block extrapolation with more capable one.
Separate paddings for different directions can be now specified.
2021-03-08 22:36:04 +02:00
Marko Viitanen e05dcdb193 Enable sign hiding in quant_avx2 and fix a bug in kvz_encode_coeff_nxn_generic() 2021-02-12 16:40:28 +02:00
Marko Viitanen 79c36f6aeb Enable RDOQ and sign hiding 2021-02-12 13:24:02 +02:00
Arttu Makinen 7098a94a6f Implemented implicit MTS.
Added selection of implicit MTS to command parameters.
Updated the transform selection to support implicit MTS.
2021-02-11 15:11:15 +02:00
Arttu Mäkinen 8f34685a8f Merge branch 'master' into 'mts'
# Conflicts:
#   src/cfg.c
#   src/kvazaar.h
2021-02-10 13:05:18 +02:00
Arttu Makinen c5570abe1b Removed 'emt' variable from cu_info_t and changed 'emt' globally to 'mts' for consistency. 2021-02-10 12:08:05 +02:00
Arttu Makinen d0b7dd95f7 MTS works on intra mode.
Fixed usage of MTS constraints.
Fixed DCT8 transforms.
Added sorting function of MTS modes with intra modes and costs to search.c.
2021-02-10 11:01:58 +02:00
Arttu Makinen 2e7c342645 Implemented DCT2, DST7, and DCT8 transforms, and search for selecting transform for MTS. Using MTS results mismatch for luma component. 2021-02-02 11:09:43 +02:00
Arttu Makinen b9c3336f0e MTS bitstream encoding added for intra. Work with depths 0-3. 2021-01-18 20:44:36 +02:00
Arttu Makinen 98a8e78e93 avx2/encode_coding_tree-avx2.c update, because it caused errors 2020-12-30 14:25:16 +02:00
Pauli Oikkonen 816789c9f4 Allow fast coeff weights to be read from a file 2020-10-29 15:22:51 +02:00
Pauli Oikkonen 6799019db0 Move fast coeff table to transform.h
Guess this is a more logical place for it
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 4712ce5f59 Round the fast coeff result instead of flooring 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 0fb09c9920 New filtered coeff weight by QP values 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 24d487f553 New weights for 12 <= QP <= 42
Trained using MSU ultrafast settings now
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 3e1c6d84b8 Fix issues in fast coeff estimation
Allow weight table to start from nonzero QP, and round weights to Q8.8
instead of flooring them
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 5f91bda762 Use newer data for fast coeff cost estimation
Same training dataset, but this time only buckets 0...3 were used to
approximate the function, no sign/cg width bucket.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 2abd733199 Use unsigned min() to correctly clip -32768
If a coeff happens to be -32768 (0x8000), its 16-bit abs() is also
0x8000. It should ultimately be clipped to 3, so interpret absolute
values as unsigned instead to make that happen.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen b93b90c0d7 Implement new fast coeff cost estimator in AVX2 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 2f74a112b3 Try first lookup table based fast coeff estimation 2020-10-29 15:20:27 +02:00
Marko Viitanen 2db3a07b14 Prevent cu_sig_model_chroma array from being indexed over the limit 2020-10-13 14:14:57 +03:00
Marko Viitanen bddfb47a55 Merge remote-tracking branch 'remotes/kvazaar_github/master' 2020-09-25 11:49:11 +03:00
Marko Viitanen 449975b0fb Fixed cubic filter usage in intra angular modes 2020-09-21 14:58:34 +03:00
Pauli Oikkonen 780da4568a Exclude 8-bit-only code from 10-bit builds and use uint8_t instead of kvz_pixel for code that assumes 8-bit pixels 2020-09-02 17:46:33 +03:00
Marko Viitanen 574c4d06ee Fix use of log2_cg_size in coeff coding -> smaller blocks also decoded correctly 2020-08-27 18:26:16 +03:00
Marko Viitanen 20b66c9949 Sync to VTM 8.2 and add separate height to last_sig coding 2020-04-29 08:52:38 +03:00
Jan Beich 1fa69c705d Rename truncate() from 30ce461d98 to avoid conflict with POSIX version
strategies/avx2/dct-avx2.c:55:23: error: static declaration of 'truncate' follows non-static declaration
static INLINE __m256i truncate(__m256i v, __m256i debias, int32_t shift)
                      ^
/usr/include/stdio.h:448:6: note: previous declaration is here
int      truncate(const char *, __off_t);
         ^
2020-04-22 16:09:42 +00:00
Marko Viitanen 86d76b19a4 Fix intra neighboring block selection and clean some unused code 2020-04-16 14:12:40 +03:00
Ari Lemmetti f31dddc019 Bypass inverse quantization and inverse transform when trying early skip 2020-04-10 16:02:09 +03:00
Pauli Oikkonen 8617530b13 Use _mm_store_epi64 instead of _mm_cvtsi128_si64
Fix 32-bit builds that tend to lack the cvt intrinsic. Hope it will be
optimized to a movq r64, xmm on modern platforms though
2020-04-07 23:51:54 +03:00
Pauli Oikkonen a82966c0f5 Fix lacking _mm256_cvtss_f32 intrinsic on VS
Cast __m256 into __m128 first, the XMM variant of the intrinsic has been
around for a long enough time to be supported
2020-04-07 22:38:10 +03:00
Ari Lemmetti 901c25c0c8 Merge branch 'vaq' 2020-04-03 19:51:17 +03:00
Ari Lemmetti 51451be5ef Handle cases where the number of pixels is not divisible by 32 2020-04-03 19:37:47 +03:00
siivonek e5267f7706 Fix define for use with Visual Studio. 2020-04-03 15:11:01 +02:00
Pauli Oikkonen addc1c3ede Fix warning about potentially unused hsum_8x32b
There's a lot of alternative options available, such as making it
globally visible with a kvz_ prefix, force inlining it, or anything.
This could be good too, hope it won't be compiled at all to translation
units where it's not used.
2020-04-02 16:44:22 +03:00
siivonek 566680af7b Move function hsum to file where it is used to avoid errors. 2020-04-02 14:03:06 +02:00
siivonek 58be514e2a Fix pipeline error. 2020-04-02 13:50:08 +02:00