Commit graph

692 commits

Author SHA1 Message Date
Ari Lemmetti 146298a0df New AVX2 block averaging *WIP* missing small chroma block and SMP/AMP 2021-11-08 23:01:13 +02:00
Ari Lemmetti ef69c65c58 New bipred average functions 2021-11-08 23:01:12 +02:00
Ari Lemmetti f47bd5d86f Rename some bipred functions 2021-11-08 23:01:12 +02:00
Ari Lemmetti b52a930bed About working with generics 2021-11-08 23:01:12 +02:00
Ari Lemmetti e7857cbb24 Remove avx2 blending 2021-11-08 22:45:45 +02:00
Marko Viitanen 4a42b5cbc4 [cleanup] Remove HMVP debug code and extra arrays in intra coding 2021-11-08 10:11:17 +02:00
Marko Viitanen 73c4128100 [quant] Map scalinglistType correctly 2021-10-29 09:10:15 +03:00
Marko Viitanen 492d22e8be Disable interpolation AVX2 optimizations for now 2021-10-29 08:43:52 +03:00
Marko Viitanen 57883369ca Change all the license texts in source headers and LICENSE file to 3-clause BSD, closes #302
* All now have the same exact text string
2021-10-13 15:22:46 +03:00
Ari Lemmetti 171b9c60b3 [SIMD] Convert planar and DC mode PDPC loops to AVX2 2021-09-08 03:40:38 +03:00
Ari Lemmetti ad35d4a4c8 [SIMD] Loop transformation, prepare data for latter loop 2021-09-06 22:38:37 +03:00
Ari Lemmetti 22da8cfe65 [SIMD] Loop transformations for SIMD processing 2021-09-06 22:30:36 +03:00
Ari Lemmetti c195d906d3 [SIMD] Copy generic implementation of planar/DC PDPC as a skeleton 2021-09-06 21:20:51 +03:00
Ari Lemmetti c6b33c7b92 [SIMD] Move PDPC condition out of strategy 2021-09-06 21:20:51 +03:00
Ari Lemmetti 46cf9b6871 [SIMD] Make strategy out of PDPC for planar and DC 2021-09-06 21:20:51 +03:00
Ari Lemmetti 816e7a5a91 [SIMD] Replace PDPC remainder loop with masking operations 2021-09-06 21:20:51 +03:00
Ari Lemmetti 1926b4cc27 [SIMD] Initial AVX2 code for transpose in angular prediction 2021-09-06 21:20:50 +03:00
Ari Lemmetti 913573baca [SIMD] Initial AVX2 code for PDPC in angular prediction 2021-09-06 21:20:50 +03:00
Ari Lemmetti 7ccd1a571c [SIMD] Initial AVX2 code for 4-tap filtering in angular prediction. 2021-09-06 21:20:50 +03:00
Ari Lemmetti 20f0ff976d [SIMD] Transform angular pred loops for SIMD processing. 2021-09-06 21:20:49 +03:00
Ari Lemmetti 3dfe09e850 [SIMD] Copy generic implementation of angular prediction as a skeleton. 2021-09-06 21:20:46 +03:00
Joose Sainio 450cbd356c Merge branch 'joint_cbcr' into 'master'
[jccr] Add joint coding of chroma residual

See merge request cs/ultravideo/vvc/uvg266!6
2021-09-06 11:43:06 +03:00
Joose Sainio 0592cc65a0 [jccr] enable rdoq with jccr 2021-09-06 11:28:20 +03:00
Joose Sainio 072b84711a [jccr] fix 64×64 CUs 2021-09-06 11:28:20 +03:00
Joose Sainio 042b5078d8 [jccr] WIP initial implementation
Add somekind of search for joint chroma residual coding.
Bitstream is currently correct but prediction is incorrect because the jccr
is actually not used in the search.

Hard coded to be enabled
2021-09-06 11:28:08 +03:00
Marko Viitanen 26f18865f7 [alf] Change the processing in alf_get_blk_stats_avx2() to allow utilizing the whole 256bit register 2021-08-27 13:40:28 +03:00
Marko Viitanen fdf125f406 [alf] Fix incorrect conversion in alf_get_blk_stats_avx2 2021-08-27 10:25:20 +03:00
Marko Viitanen 6714973264 [alf] Change _mm_store_si128 to _mm_storeu_si128 in alf_get_blk_stats_avx2() 2021-08-26 18:05:06 +03:00
Marko Viitanen 5df8add046 [alf] Change order of alf_covariance.y array for better AVX2 optimization in alf_get_blk_stats_avx2() 2021-08-26 15:37:01 +03:00
Marko Viitanen be9527cf1d [alf] Change the order of alf_covariance.ee values to get better optimized solution for alf_get_blk_stats_avx2() 2021-08-26 11:07:13 +03:00
Marko Viitanen f4de5cfd0f [alf] Cleanup alf_calc_covariance_avx2() and use integers in alf_get_blk_stats_avx2() 2021-08-26 10:20:57 +03:00
Marko Viitanen 915bf3ca24 [alf] Fix AVX2 priority 2021-08-25 20:29:58 +03:00
Marko Viitanen 8ef3e6a126 [alf] Add strategy for alf_get_blk_stats() and an initial AVX2 version 2021-08-25 20:22:24 +03:00
Marko Viitanen f61b9138cd [alf] Import SSE4.1 optimized 5x5 and 7x7 filters from VTM13
* Modified to work with 8-bit pixels
2021-08-25 11:50:37 +03:00
Marko Viitanen dc6a29b0d8 [alf] Initial generic strategies for 5x5 and 7x7 filtering 2021-08-25 10:50:00 +03:00
Marko Viitanen c3c96d69c2 [alf] Add modified alf_derive_classification_blk_sse41() from VTM 13.0
* Modified to work with bitdepth 8
2021-08-20 11:45:02 +03:00
Marko Viitanen b158d05bca [alf] rename strategy function to include prefix 2021-08-19 17:19:17 +03:00
Marko Viitanen 3efaeede76 [alf] Define the strategy for alf_derive_classification_blk() 2021-08-19 17:04:35 +03:00
Marko Viitanen d742f57779 Remove angular_pred_avx2 so we don't need extra parameter 2021-08-15 10:43:48 +03:00
Marko Viitanen 5604b6f946 [cleanup] remove all crypto related stuff, fix warnings, move estimate.m to tools/ 2021-07-27 09:27:51 +03:00
Marko Viitanen 99a2b0384d [cleanup] remove some warnings 2021-07-26 11:42:19 +03:00
Marko Viitanen 0cad1ac3c9 [mts] Add a comment about idct8/idst7 16x16 being unoptimized 2021-07-21 14:02:23 +03:00
Marko Viitanen d5ef036d35 [mts] change mts_subset tables back to static 2021-07-21 13:54:59 +03:00
Marko Viitanen 60caf2c378 [mts] fix 32x32 idst/idct 2021-07-21 13:44:25 +03:00
Marko Viitanen c2cd5fb98e [mts] replace AVX2 DST7/DCT8 16x16 with unoptimized for now 2021-07-21 13:38:17 +03:00
Marko Viitanen 7e089f518d [mts] add optimized versions of DCT8 and DST7, inverse not yet working properly
* Includes new unit tests for the mts
2021-07-21 11:53:15 +03:00
Marko Viitanen 7f67009511 Fix MD5 calculations from HEVC to VVC way 2021-06-24 15:03:29 +03:00
Marko Viitanen c004735821 [LMCS] Fix casting of the chroma scaled residual 2021-06-18 09:35:06 +03:00
Joose Sainio cfffd7166c Use correct context for calculating coeff costs for transform skip 2021-06-07 13:06:03 +03:00
Marko Viitanen 4594bf0ca8 Merge branch 'lmcs_chroma' 2021-06-02 15:05:04 +03:00
Marko Viitanen 5babb14ee7 [LMCS] Use chroma scaling 2021-06-01 12:17:03 +03:00
Joose Sainio f9de8ebc4f Merge branch 'master' into '4x4-rd'
# Conflicts:
#   src/encoder.c
#   tests/test_intra.sh
2021-05-28 11:43:55 +00:00
Marko Viitanen dbc7fd48bf [LMCS] Initialize some m_reshapeCW values to avoid division by zero 2021-05-24 18:57:37 +03:00
Marko Viitanen 73ac3b68bf [LMCS] add missing header in quant-avx2.c 2021-05-24 17:25:38 +03:00
Marko Viitanen 4cd5bc38a1 [LMCS] Luma mapping working after some rework, have to keep the reconstruction in the mapped domain 2021-05-24 17:23:17 +03:00
Joose Sainio cfd7d2666b slightly optimize intra-generic.c 2021-05-14 10:23:37 +03:00
Joose Sainio 7674e94fd1 [rdoq] transform skip RDOQ
Copy the implementation from VTM
2021-05-03 12:52:10 +03:00
Joose Sainio d2b9893bb7 [transform skip] Fix misunderstanding that caused TS to use QP 52>= 2021-04-30 10:55:23 +03:00
Joose Sainio a998f3ed74 [transform-skip] Convert the HEVC transfrom skip to VVC
For some reason transform skip uses QP MAX(52, QP) and the coeffs are
no longer shifted
2021-04-30 10:55:23 +03:00
Joose Sainio 2ab005692d Enable 4x4 intra CUs 2021-04-23 10:57:29 +03:00
Joose Sainio 1aaa95601c Merge remote-tracking branch 'remotes/kvz_github/master' into Fix-monochrome
# Conflicts:
#	.gitlab-ci.yml
#	build/kvazaar_lib/kvazaar_lib.vcxproj.filters
#	src/cfg.c
#	src/encoder.h
#	src/kvazaar.h
#	src/rdo.c
2021-04-23 10:56:50 +03:00
Joose Sainio e8eab326fb Update context selection to match VVC 2021-04-23 10:51:01 +03:00
Joose Sainio b2076d3b39 Enable chroma scaling
WIP: user defined scaling array
2021-03-16 10:31:26 +02:00
Joose Sainio 412781db41 [scalinglist] Fix quant-generic 2021-03-09 10:42:40 +02:00
Joose Sainio 30e573c261 [scalinglist] WIP: Update scalinglist for VVC
Seems to work when rdoq is enabled but not when it is disabled
2021-03-09 09:51:49 +02:00
Ari Lemmetti dad3d6818e Only read left and right border pixels if necessary 2021-03-08 22:36:10 +02:00
Ari Lemmetti b72ab583b4 Handle "don't care" rows in the end separately 2021-03-08 22:36:09 +02:00
Ari Lemmetti 33295bf350 Use AVX2 luma interpolation for SMP and AMP as well 2021-03-08 22:36:09 +02:00
Ari Lemmetti 7ce68761c2 Add a reminder to fix a rare case for bipred 2021-03-08 22:36:09 +02:00
Ari Lemmetti 475f1d79d5 Add some defines for important interpolation related sizes 2021-03-08 22:36:09 +02:00
Ari Lemmetti 4314f3a9a7 Rename some interpolation functions and strategies for consistency 2021-03-08 22:36:08 +02:00
Ari Lemmetti 5a70b49f69 Require 64-bit build for AVX2 interpolation filter functions 2021-03-08 22:36:08 +02:00
Ari Lemmetti 5631651469 Remove unused functions and variables 2021-03-08 22:36:08 +02:00
Ari Lemmetti e38219e489 Fix epol_func signature and function definition 2021-03-08 22:36:07 +02:00
Ari Lemmetti 7e6ba9750f Add new AVX2 ip filters for chroma 2021-03-08 22:36:07 +02:00
Ari Lemmetti 3476fc62c7 Fix parameter to signed 2021-03-08 22:36:06 +02:00
Ari Lemmetti e572066e46 Add new AVX2 vertical ip filter for pixel precision 2021-03-08 22:36:06 +02:00
Ari Lemmetti 9e4b62a891 Use the new horizontal filter for pixel precision as well 2021-03-08 22:36:06 +02:00
Ari Lemmetti 2175023843 Relocate function 2021-03-08 22:36:06 +02:00
Ari Lemmetti f5b0e3c52b Add new AVX2 horizontal ip filter capable of every luma PB 2021-03-08 22:36:05 +02:00
Ari Lemmetti d9a3225ae5 Add new AVX2 vertical ip filter for high-precision 2021-03-08 22:36:05 +02:00
Ari Lemmetti 84222cf3e7 Replace old block extrapolation with more capable one.
Separate paddings for different directions can be now specified.
2021-03-08 22:36:04 +02:00
Marko Viitanen e05dcdb193 Enable sign hiding in quant_avx2 and fix a bug in kvz_encode_coeff_nxn_generic() 2021-02-12 16:40:28 +02:00
Marko Viitanen 79c36f6aeb Enable RDOQ and sign hiding 2021-02-12 13:24:02 +02:00
Arttu Makinen 7098a94a6f Implemented implicit MTS.
Added selection of implicit MTS to command parameters.
Updated the transform selection to support implicit MTS.
2021-02-11 15:11:15 +02:00
Arttu Mäkinen 8f34685a8f Merge branch 'master' into 'mts'
# Conflicts:
#   src/cfg.c
#   src/kvazaar.h
2021-02-10 13:05:18 +02:00
Arttu Makinen c5570abe1b Removed 'emt' variable from cu_info_t and changed 'emt' globally to 'mts' for consistency. 2021-02-10 12:08:05 +02:00
Arttu Makinen d0b7dd95f7 MTS works on intra mode.
Fixed usage of MTS constraints.
Fixed DCT8 transforms.
Added sorting function of MTS modes with intra modes and costs to search.c.
2021-02-10 11:01:58 +02:00
Arttu Makinen 2e7c342645 Implemented DCT2, DST7, and DCT8 transforms, and search for selecting transform for MTS. Using MTS results mismatch for luma component. 2021-02-02 11:09:43 +02:00
Arttu Makinen b9c3336f0e MTS bitstream encoding added for intra. Work with depths 0-3. 2021-01-18 20:44:36 +02:00
Arttu Makinen 98a8e78e93 avx2/encode_coding_tree-avx2.c update, because it caused errors 2020-12-30 14:25:16 +02:00
Pauli Oikkonen 816789c9f4 Allow fast coeff weights to be read from a file 2020-10-29 15:22:51 +02:00
Pauli Oikkonen 6799019db0 Move fast coeff table to transform.h
Guess this is a more logical place for it
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 4712ce5f59 Round the fast coeff result instead of flooring 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 0fb09c9920 New filtered coeff weight by QP values 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 24d487f553 New weights for 12 <= QP <= 42
Trained using MSU ultrafast settings now
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 3e1c6d84b8 Fix issues in fast coeff estimation
Allow weight table to start from nonzero QP, and round weights to Q8.8
instead of flooring them
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 5f91bda762 Use newer data for fast coeff cost estimation
Same training dataset, but this time only buckets 0...3 were used to
approximate the function, no sign/cg width bucket.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 2abd733199 Use unsigned min() to correctly clip -32768
If a coeff happens to be -32768 (0x8000), its 16-bit abs() is also
0x8000. It should ultimately be clipped to 3, so interpret absolute
values as unsigned instead to make that happen.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen b93b90c0d7 Implement new fast coeff cost estimator in AVX2 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 2f74a112b3 Try first lookup table based fast coeff estimation 2020-10-29 15:20:27 +02:00
Marko Viitanen 2db3a07b14 Prevent cu_sig_model_chroma array from being indexed over the limit 2020-10-13 14:14:57 +03:00
Marko Viitanen bddfb47a55 Merge remote-tracking branch 'remotes/kvazaar_github/master' 2020-09-25 11:49:11 +03:00
Marko Viitanen 449975b0fb Fixed cubic filter usage in intra angular modes 2020-09-21 14:58:34 +03:00
Pauli Oikkonen 780da4568a Exclude 8-bit-only code from 10-bit builds and use uint8_t instead of kvz_pixel for code that assumes 8-bit pixels 2020-09-02 17:46:33 +03:00
Marko Viitanen 574c4d06ee Fix use of log2_cg_size in coeff coding -> smaller blocks also decoded correctly 2020-08-27 18:26:16 +03:00
Marko Viitanen 20b66c9949 Sync to VTM 8.2 and add separate height to last_sig coding 2020-04-29 08:52:38 +03:00
Jan Beich 1fa69c705d Rename truncate() from 30ce461d98 to avoid conflict with POSIX version
strategies/avx2/dct-avx2.c:55:23: error: static declaration of 'truncate' follows non-static declaration
static INLINE __m256i truncate(__m256i v, __m256i debias, int32_t shift)
                      ^
/usr/include/stdio.h:448:6: note: previous declaration is here
int      truncate(const char *, __off_t);
         ^
2020-04-22 16:09:42 +00:00
Marko Viitanen 86d76b19a4 Fix intra neighboring block selection and clean some unused code 2020-04-16 14:12:40 +03:00
Ari Lemmetti f31dddc019 Bypass inverse quantization and inverse transform when trying early skip 2020-04-10 16:02:09 +03:00
Pauli Oikkonen 8617530b13 Use _mm_store_epi64 instead of _mm_cvtsi128_si64
Fix 32-bit builds that tend to lack the cvt intrinsic. Hope it will be
optimized to a movq r64, xmm on modern platforms though
2020-04-07 23:51:54 +03:00
Pauli Oikkonen a82966c0f5 Fix lacking _mm256_cvtss_f32 intrinsic on VS
Cast __m256 into __m128 first, the XMM variant of the intrinsic has been
around for a long enough time to be supported
2020-04-07 22:38:10 +03:00
Ari Lemmetti 901c25c0c8 Merge branch 'vaq' 2020-04-03 19:51:17 +03:00
Ari Lemmetti 51451be5ef Handle cases where the number of pixels is not divisible by 32 2020-04-03 19:37:47 +03:00
siivonek e5267f7706 Fix define for use with Visual Studio. 2020-04-03 15:11:01 +02:00
Pauli Oikkonen addc1c3ede Fix warning about potentially unused hsum_8x32b
There's a lot of alternative options available, such as making it
globally visible with a kvz_ prefix, force inlining it, or anything.
This could be good too, hope it won't be compiled at all to translation
units where it's not used.
2020-04-02 16:44:22 +03:00
siivonek 566680af7b Move function hsum to file where it is used to avoid errors. 2020-04-02 14:03:06 +02:00
siivonek 58be514e2a Fix pipeline error. 2020-04-02 13:50:08 +02:00
Pauli Oikkonen 99889dab15 Fix switch(bool) in picture-avx2.c
It passes on GCC but warns on Clang
2020-03-31 15:42:19 +03:00
Jaakko Laitinen af3d559d8d Let pu-depth be defined per gop-layer 2020-03-17 17:57:18 +02:00
Pauli Oikkonen 60e7956dc5 Disable inaccurate integer variance calculation for now 2020-03-02 19:18:55 +02:00
Pauli Oikkonen fc1b91335b Implement variance calculation in integer math
Maybe this is a bit faster than FP, it's not accurate though
2020-03-02 18:17:18 +02:00
Pauli Oikkonen 35c825c75f Move hsum_8x32b to avx2_common_functions 2020-02-27 17:52:17 +02:00
Pauli Oikkonen b00ac7d1c4 AVX2 version of buffer variance calculation 2020-02-25 15:57:56 +02:00
Pauli Oikkonen 1bd9c6dd93 Make a strategy out of pixel_var 2020-02-24 19:37:36 +02:00
Ari Lemmetti 3c7dd0752f Remove the broken "no mov" branch.
Causes hash mismatches for example in SlideShow sequence.
2020-02-03 15:26:31 +02:00
RLamm 30d5df40c5 Custom headers for the distributed coding 2020-01-29 15:54:49 +02:00
Pauli Oikkonen c3d9e97e9f Fix VS build 2019-12-12 18:34:55 +02:00
Pauli Oikkonen 7f238ca299 Remove debug print functions
Whoops
2019-12-12 18:19:31 +02:00
Pauli Oikkonen eefb5e50b3 De-inline pred_filtered_dc functions, shouldn't make much difference though 2019-12-12 17:30:00 +02:00
Pauli Oikkonen 169314de4f 32x32 filtered DC prediction in AVX2 2019-12-11 18:17:06 +02:00
Pauli Oikkonen fb2481b7e4 16x16 filtered DC implemented in AVX2 2019-12-10 15:54:50 +02:00
Pauli Oikkonen da370ea36d Implement AVX2 8x8 filtered DC algorithm 2019-11-28 14:10:10 +02:00
Pauli Oikkonen 5d9b7019ca Implement a 4x4 filtered DC pred function 2019-11-26 17:05:54 +02:00
Pauli Oikkonen f1485ab087 Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes? 2019-11-25 15:20:29 +02:00
Marko Viitanen eb2caf9118 Fix intra angle filter, changed from gauss filter table to run-time calculated 4-tap filter 2019-11-19 15:15:21 +02:00
Pauli Oikkonen 979d66031c Create a strategy out of intra_pred_filtered_dc 2019-11-19 14:50:31 +02:00
Marko Viitanen 466d8772b0 Apply JVET_P0170_ZERO_POS_SIMPLIFICATION in coeff bypass coding 2019-11-19 14:32:38 +02:00
Pauli Oikkonen fa4bb86406 Optimize intra_pred_planar_avx2 for 4x4 blocks 2019-11-19 13:39:02 +02:00
Marko Viitanen 17a53230fd Code cleanup, remove unused arrays and remove tabs 2019-11-18 09:01:23 +02:00
Pauli Oikkonen 4761d228f9 Start to vectorize the 4x4 loop 2019-11-15 17:32:40 +02:00
Pauli Oikkonen 8d45ab4951 Stupidify the 4x4 planar loop for vectorization 2019-11-14 17:14:04 +02:00
Pauli Oikkonen 6d7a4f555c Also remove 16x16 (A * B^T)^T matrix multiply
Can be done using (B * A^T) instead, it's the exact same
2019-10-28 16:19:42 +02:00
Pauli Oikkonen 2c2deb2366 Tidy AVX2 32x32 matrix multiply 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 98ad78b333 Tidy the old AVX2 32x32 matrix multiply
It was actually a very good algorithm, just looked messy!
2019-10-28 16:19:42 +02:00
Pauli Oikkonen 4a921cbdb5 Retain data as much in YMM registers as possible
This seems to make it a whole lot quicker
2019-10-28 16:19:42 +02:00
Pauli Oikkonen ac4d710e23 Unroll 32x32 matrix multiply, use all regs 2019-10-28 16:19:42 +02:00
Pauli Oikkonen a58608d0b8 Remove totally unnecessary (A * B^T)^T 32x32 multiply 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 043f53539f Implement a streamlined matrix-multiply 32x32 DCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen e9da2d851b Tidy 32x32 fast DCT's helper functions 2019-10-28 16:19:42 +02:00