Ari Lemmetti
146298a0df
New AVX2 block averaging *WIP* missing small chroma block and SMP/AMP
2021-11-08 23:01:13 +02:00
Ari Lemmetti
ef69c65c58
New bipred average functions
2021-11-08 23:01:12 +02:00
Ari Lemmetti
f47bd5d86f
Rename some bipred functions
2021-11-08 23:01:12 +02:00
Ari Lemmetti
b52a930bed
About working with generics
2021-11-08 23:01:12 +02:00
Ari Lemmetti
e7857cbb24
Remove avx2 blending
2021-11-08 22:45:45 +02:00
Marko Viitanen
4a42b5cbc4
[cleanup] Remove HMVP debug code and extra arrays in intra coding
2021-11-08 10:11:17 +02:00
Marko Viitanen
73c4128100
[quant] Map scalinglistType correctly
2021-10-29 09:10:15 +03:00
Marko Viitanen
492d22e8be
Disable interpolation AVX2 optimizations for now
2021-10-29 08:43:52 +03:00
Marko Viitanen
57883369ca
Change all the license texts in source headers and LICENSE file to 3-clause BSD, closes #302
...
* All now have the same exact text string
2021-10-13 15:22:46 +03:00
Ari Lemmetti
171b9c60b3
[SIMD] Convert planar and DC mode PDPC loops to AVX2
2021-09-08 03:40:38 +03:00
Ari Lemmetti
ad35d4a4c8
[SIMD] Loop transformation, prepare data for latter loop
2021-09-06 22:38:37 +03:00
Ari Lemmetti
22da8cfe65
[SIMD] Loop transformations for SIMD processing
2021-09-06 22:30:36 +03:00
Ari Lemmetti
c195d906d3
[SIMD] Copy generic implementation of planar/DC PDPC as a skeleton
2021-09-06 21:20:51 +03:00
Ari Lemmetti
c6b33c7b92
[SIMD] Move PDPC condition out of strategy
2021-09-06 21:20:51 +03:00
Ari Lemmetti
46cf9b6871
[SIMD] Make strategy out of PDPC for planar and DC
2021-09-06 21:20:51 +03:00
Ari Lemmetti
816e7a5a91
[SIMD] Replace PDPC remainder loop with masking operations
2021-09-06 21:20:51 +03:00
Ari Lemmetti
1926b4cc27
[SIMD] Initial AVX2 code for transpose in angular prediction
2021-09-06 21:20:50 +03:00
Ari Lemmetti
913573baca
[SIMD] Initial AVX2 code for PDPC in angular prediction
2021-09-06 21:20:50 +03:00
Ari Lemmetti
7ccd1a571c
[SIMD] Initial AVX2 code for 4-tap filtering in angular prediction.
2021-09-06 21:20:50 +03:00
Ari Lemmetti
20f0ff976d
[SIMD] Transform angular pred loops for SIMD processing.
2021-09-06 21:20:49 +03:00
Ari Lemmetti
3dfe09e850
[SIMD] Copy generic implementation of angular prediction as a skeleton.
2021-09-06 21:20:46 +03:00
Joose Sainio
450cbd356c
Merge branch 'joint_cbcr' into 'master'
...
[jccr] Add joint coding of chroma residual
See merge request cs/ultravideo/vvc/uvg266!6
2021-09-06 11:43:06 +03:00
Joose Sainio
0592cc65a0
[jccr] enable rdoq with jccr
2021-09-06 11:28:20 +03:00
Joose Sainio
072b84711a
[jccr] fix 64×64 CUs
2021-09-06 11:28:20 +03:00
Joose Sainio
042b5078d8
[jccr] WIP initial implementation
...
Add somekind of search for joint chroma residual coding.
Bitstream is currently correct but prediction is incorrect because the jccr
is actually not used in the search.
Hard coded to be enabled
2021-09-06 11:28:08 +03:00
Marko Viitanen
26f18865f7
[alf] Change the processing in alf_get_blk_stats_avx2() to allow utilizing the whole 256bit register
2021-08-27 13:40:28 +03:00
Marko Viitanen
fdf125f406
[alf] Fix incorrect conversion in alf_get_blk_stats_avx2
2021-08-27 10:25:20 +03:00
Marko Viitanen
6714973264
[alf] Change _mm_store_si128 to _mm_storeu_si128 in alf_get_blk_stats_avx2()
2021-08-26 18:05:06 +03:00
Marko Viitanen
5df8add046
[alf] Change order of alf_covariance.y array for better AVX2 optimization in alf_get_blk_stats_avx2()
2021-08-26 15:37:01 +03:00
Marko Viitanen
be9527cf1d
[alf] Change the order of alf_covariance.ee values to get better optimized solution for alf_get_blk_stats_avx2()
2021-08-26 11:07:13 +03:00
Marko Viitanen
f4de5cfd0f
[alf] Cleanup alf_calc_covariance_avx2() and use integers in alf_get_blk_stats_avx2()
2021-08-26 10:20:57 +03:00
Marko Viitanen
915bf3ca24
[alf] Fix AVX2 priority
2021-08-25 20:29:58 +03:00
Marko Viitanen
8ef3e6a126
[alf] Add strategy for alf_get_blk_stats() and an initial AVX2 version
2021-08-25 20:22:24 +03:00
Marko Viitanen
f61b9138cd
[alf] Import SSE4.1 optimized 5x5 and 7x7 filters from VTM13
...
* Modified to work with 8-bit pixels
2021-08-25 11:50:37 +03:00
Marko Viitanen
dc6a29b0d8
[alf] Initial generic strategies for 5x5 and 7x7 filtering
2021-08-25 10:50:00 +03:00
Marko Viitanen
c3c96d69c2
[alf] Add modified alf_derive_classification_blk_sse41() from VTM 13.0
...
* Modified to work with bitdepth 8
2021-08-20 11:45:02 +03:00
Marko Viitanen
b158d05bca
[alf] rename strategy function to include prefix
2021-08-19 17:19:17 +03:00
Marko Viitanen
3efaeede76
[alf] Define the strategy for alf_derive_classification_blk()
2021-08-19 17:04:35 +03:00
Marko Viitanen
d742f57779
Remove angular_pred_avx2 so we don't need extra parameter
2021-08-15 10:43:48 +03:00
Marko Viitanen
5604b6f946
[cleanup] remove all crypto related stuff, fix warnings, move estimate.m to tools/
2021-07-27 09:27:51 +03:00
Marko Viitanen
99a2b0384d
[cleanup] remove some warnings
2021-07-26 11:42:19 +03:00
Marko Viitanen
0cad1ac3c9
[mts] Add a comment about idct8/idst7 16x16 being unoptimized
2021-07-21 14:02:23 +03:00
Marko Viitanen
d5ef036d35
[mts] change mts_subset tables back to static
2021-07-21 13:54:59 +03:00
Marko Viitanen
60caf2c378
[mts] fix 32x32 idst/idct
2021-07-21 13:44:25 +03:00
Marko Viitanen
c2cd5fb98e
[mts] replace AVX2 DST7/DCT8 16x16 with unoptimized for now
2021-07-21 13:38:17 +03:00
Marko Viitanen
7e089f518d
[mts] add optimized versions of DCT8 and DST7, inverse not yet working properly
...
* Includes new unit tests for the mts
2021-07-21 11:53:15 +03:00
Marko Viitanen
7f67009511
Fix MD5 calculations from HEVC to VVC way
2021-06-24 15:03:29 +03:00
Marko Viitanen
c004735821
[LMCS] Fix casting of the chroma scaled residual
2021-06-18 09:35:06 +03:00
Joose Sainio
cfffd7166c
Use correct context for calculating coeff costs for transform skip
2021-06-07 13:06:03 +03:00
Marko Viitanen
4594bf0ca8
Merge branch 'lmcs_chroma'
2021-06-02 15:05:04 +03:00
Marko Viitanen
5babb14ee7
[LMCS] Use chroma scaling
2021-06-01 12:17:03 +03:00
Joose Sainio
f9de8ebc4f
Merge branch 'master' into '4x4-rd'
...
# Conflicts:
# src/encoder.c
# tests/test_intra.sh
2021-05-28 11:43:55 +00:00
Marko Viitanen
dbc7fd48bf
[LMCS] Initialize some m_reshapeCW values to avoid division by zero
2021-05-24 18:57:37 +03:00
Marko Viitanen
73ac3b68bf
[LMCS] add missing header in quant-avx2.c
2021-05-24 17:25:38 +03:00
Marko Viitanen
4cd5bc38a1
[LMCS] Luma mapping working after some rework, have to keep the reconstruction in the mapped domain
2021-05-24 17:23:17 +03:00
Joose Sainio
cfd7d2666b
slightly optimize intra-generic.c
2021-05-14 10:23:37 +03:00
Joose Sainio
7674e94fd1
[rdoq] transform skip RDOQ
...
Copy the implementation from VTM
2021-05-03 12:52:10 +03:00
Joose Sainio
d2b9893bb7
[transform skip] Fix misunderstanding that caused TS to use QP 52>=
2021-04-30 10:55:23 +03:00
Joose Sainio
a998f3ed74
[transform-skip] Convert the HEVC transfrom skip to VVC
...
For some reason transform skip uses QP MAX(52, QP) and the coeffs are
no longer shifted
2021-04-30 10:55:23 +03:00
Joose Sainio
2ab005692d
Enable 4x4 intra CUs
2021-04-23 10:57:29 +03:00
Joose Sainio
1aaa95601c
Merge remote-tracking branch 'remotes/kvz_github/master' into Fix-monochrome
...
# Conflicts:
# .gitlab-ci.yml
# build/kvazaar_lib/kvazaar_lib.vcxproj.filters
# src/cfg.c
# src/encoder.h
# src/kvazaar.h
# src/rdo.c
2021-04-23 10:56:50 +03:00
Joose Sainio
e8eab326fb
Update context selection to match VVC
2021-04-23 10:51:01 +03:00
Joose Sainio
b2076d3b39
Enable chroma scaling
...
WIP: user defined scaling array
2021-03-16 10:31:26 +02:00
Joose Sainio
412781db41
[scalinglist] Fix quant-generic
2021-03-09 10:42:40 +02:00
Joose Sainio
30e573c261
[scalinglist] WIP: Update scalinglist for VVC
...
Seems to work when rdoq is enabled but not when it is disabled
2021-03-09 09:51:49 +02:00
Ari Lemmetti
dad3d6818e
Only read left and right border pixels if necessary
2021-03-08 22:36:10 +02:00
Ari Lemmetti
b72ab583b4
Handle "don't care" rows in the end separately
2021-03-08 22:36:09 +02:00
Ari Lemmetti
33295bf350
Use AVX2 luma interpolation for SMP and AMP as well
2021-03-08 22:36:09 +02:00
Ari Lemmetti
7ce68761c2
Add a reminder to fix a rare case for bipred
2021-03-08 22:36:09 +02:00
Ari Lemmetti
475f1d79d5
Add some defines for important interpolation related sizes
2021-03-08 22:36:09 +02:00
Ari Lemmetti
4314f3a9a7
Rename some interpolation functions and strategies for consistency
2021-03-08 22:36:08 +02:00
Ari Lemmetti
5a70b49f69
Require 64-bit build for AVX2 interpolation filter functions
2021-03-08 22:36:08 +02:00
Ari Lemmetti
5631651469
Remove unused functions and variables
2021-03-08 22:36:08 +02:00
Ari Lemmetti
e38219e489
Fix epol_func signature and function definition
2021-03-08 22:36:07 +02:00
Ari Lemmetti
7e6ba9750f
Add new AVX2 ip filters for chroma
2021-03-08 22:36:07 +02:00
Ari Lemmetti
3476fc62c7
Fix parameter to signed
2021-03-08 22:36:06 +02:00
Ari Lemmetti
e572066e46
Add new AVX2 vertical ip filter for pixel precision
2021-03-08 22:36:06 +02:00
Ari Lemmetti
9e4b62a891
Use the new horizontal filter for pixel precision as well
2021-03-08 22:36:06 +02:00
Ari Lemmetti
2175023843
Relocate function
2021-03-08 22:36:06 +02:00
Ari Lemmetti
f5b0e3c52b
Add new AVX2 horizontal ip filter capable of every luma PB
2021-03-08 22:36:05 +02:00
Ari Lemmetti
d9a3225ae5
Add new AVX2 vertical ip filter for high-precision
2021-03-08 22:36:05 +02:00
Ari Lemmetti
84222cf3e7
Replace old block extrapolation with more capable one.
...
Separate paddings for different directions can be now specified.
2021-03-08 22:36:04 +02:00
Marko Viitanen
e05dcdb193
Enable sign hiding in quant_avx2 and fix a bug in kvz_encode_coeff_nxn_generic()
2021-02-12 16:40:28 +02:00
Marko Viitanen
79c36f6aeb
Enable RDOQ and sign hiding
2021-02-12 13:24:02 +02:00
Arttu Makinen
7098a94a6f
Implemented implicit MTS.
...
Added selection of implicit MTS to command parameters.
Updated the transform selection to support implicit MTS.
2021-02-11 15:11:15 +02:00
Arttu Mäkinen
8f34685a8f
Merge branch 'master' into 'mts'
...
# Conflicts:
# src/cfg.c
# src/kvazaar.h
2021-02-10 13:05:18 +02:00
Arttu Makinen
c5570abe1b
Removed 'emt' variable from cu_info_t and changed 'emt' globally to 'mts' for consistency.
2021-02-10 12:08:05 +02:00
Arttu Makinen
d0b7dd95f7
MTS works on intra mode.
...
Fixed usage of MTS constraints.
Fixed DCT8 transforms.
Added sorting function of MTS modes with intra modes and costs to search.c.
2021-02-10 11:01:58 +02:00
Arttu Makinen
2e7c342645
Implemented DCT2, DST7, and DCT8 transforms, and search for selecting transform for MTS. Using MTS results mismatch for luma component.
2021-02-02 11:09:43 +02:00
Arttu Makinen
b9c3336f0e
MTS bitstream encoding added for intra. Work with depths 0-3.
2021-01-18 20:44:36 +02:00
Arttu Makinen
98a8e78e93
avx2/encode_coding_tree-avx2.c update, because it caused errors
2020-12-30 14:25:16 +02:00
Pauli Oikkonen
816789c9f4
Allow fast coeff weights to be read from a file
2020-10-29 15:22:51 +02:00
Pauli Oikkonen
6799019db0
Move fast coeff table to transform.h
...
Guess this is a more logical place for it
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
4712ce5f59
Round the fast coeff result instead of flooring
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
0fb09c9920
New filtered coeff weight by QP values
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
24d487f553
New weights for 12 <= QP <= 42
...
Trained using MSU ultrafast settings now
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
3e1c6d84b8
Fix issues in fast coeff estimation
...
Allow weight table to start from nonzero QP, and round weights to Q8.8
instead of flooring them
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
5f91bda762
Use newer data for fast coeff cost estimation
...
Same training dataset, but this time only buckets 0...3 were used to
approximate the function, no sign/cg width bucket.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
2abd733199
Use unsigned min() to correctly clip -32768
...
If a coeff happens to be -32768 (0x8000), its 16-bit abs() is also
0x8000. It should ultimately be clipped to 3, so interpret absolute
values as unsigned instead to make that happen.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
b93b90c0d7
Implement new fast coeff cost estimator in AVX2
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
2f74a112b3
Try first lookup table based fast coeff estimation
2020-10-29 15:20:27 +02:00
Marko Viitanen
2db3a07b14
Prevent cu_sig_model_chroma array from being indexed over the limit
2020-10-13 14:14:57 +03:00
Marko Viitanen
bddfb47a55
Merge remote-tracking branch 'remotes/kvazaar_github/master'
2020-09-25 11:49:11 +03:00
Marko Viitanen
449975b0fb
Fixed cubic filter usage in intra angular modes
2020-09-21 14:58:34 +03:00
Pauli Oikkonen
780da4568a
Exclude 8-bit-only code from 10-bit builds and use uint8_t instead of kvz_pixel for code that assumes 8-bit pixels
2020-09-02 17:46:33 +03:00
Marko Viitanen
574c4d06ee
Fix use of log2_cg_size in coeff coding -> smaller blocks also decoded correctly
2020-08-27 18:26:16 +03:00
Marko Viitanen
20b66c9949
Sync to VTM 8.2 and add separate height to last_sig coding
2020-04-29 08:52:38 +03:00
Jan Beich
1fa69c705d
Rename truncate() from 30ce461d98
to avoid conflict with POSIX version
...
strategies/avx2/dct-avx2.c:55:23: error: static declaration of 'truncate' follows non-static declaration
static INLINE __m256i truncate(__m256i v, __m256i debias, int32_t shift)
^
/usr/include/stdio.h:448:6: note: previous declaration is here
int truncate(const char *, __off_t);
^
2020-04-22 16:09:42 +00:00
Marko Viitanen
86d76b19a4
Fix intra neighboring block selection and clean some unused code
2020-04-16 14:12:40 +03:00
Ari Lemmetti
f31dddc019
Bypass inverse quantization and inverse transform when trying early skip
2020-04-10 16:02:09 +03:00
Pauli Oikkonen
8617530b13
Use _mm_store_epi64 instead of _mm_cvtsi128_si64
...
Fix 32-bit builds that tend to lack the cvt intrinsic. Hope it will be
optimized to a movq r64, xmm on modern platforms though
2020-04-07 23:51:54 +03:00
Pauli Oikkonen
a82966c0f5
Fix lacking _mm256_cvtss_f32 intrinsic on VS
...
Cast __m256 into __m128 first, the XMM variant of the intrinsic has been
around for a long enough time to be supported
2020-04-07 22:38:10 +03:00
Ari Lemmetti
901c25c0c8
Merge branch 'vaq'
2020-04-03 19:51:17 +03:00
Ari Lemmetti
51451be5ef
Handle cases where the number of pixels is not divisible by 32
2020-04-03 19:37:47 +03:00
siivonek
e5267f7706
Fix define for use with Visual Studio.
2020-04-03 15:11:01 +02:00
Pauli Oikkonen
addc1c3ede
Fix warning about potentially unused hsum_8x32b
...
There's a lot of alternative options available, such as making it
globally visible with a kvz_ prefix, force inlining it, or anything.
This could be good too, hope it won't be compiled at all to translation
units where it's not used.
2020-04-02 16:44:22 +03:00
siivonek
566680af7b
Move function hsum to file where it is used to avoid errors.
2020-04-02 14:03:06 +02:00
siivonek
58be514e2a
Fix pipeline error.
2020-04-02 13:50:08 +02:00
Pauli Oikkonen
99889dab15
Fix switch(bool) in picture-avx2.c
...
It passes on GCC but warns on Clang
2020-03-31 15:42:19 +03:00
Jaakko Laitinen
af3d559d8d
Let pu-depth be defined per gop-layer
2020-03-17 17:57:18 +02:00
Pauli Oikkonen
60e7956dc5
Disable inaccurate integer variance calculation for now
2020-03-02 19:18:55 +02:00
Pauli Oikkonen
fc1b91335b
Implement variance calculation in integer math
...
Maybe this is a bit faster than FP, it's not accurate though
2020-03-02 18:17:18 +02:00
Pauli Oikkonen
35c825c75f
Move hsum_8x32b to avx2_common_functions
2020-02-27 17:52:17 +02:00
Pauli Oikkonen
b00ac7d1c4
AVX2 version of buffer variance calculation
2020-02-25 15:57:56 +02:00
Pauli Oikkonen
1bd9c6dd93
Make a strategy out of pixel_var
2020-02-24 19:37:36 +02:00
Ari Lemmetti
3c7dd0752f
Remove the broken "no mov" branch.
...
Causes hash mismatches for example in SlideShow sequence.
2020-02-03 15:26:31 +02:00
RLamm
30d5df40c5
Custom headers for the distributed coding
2020-01-29 15:54:49 +02:00
Pauli Oikkonen
c3d9e97e9f
Fix VS build
2019-12-12 18:34:55 +02:00
Pauli Oikkonen
7f238ca299
Remove debug print functions
...
Whoops
2019-12-12 18:19:31 +02:00
Pauli Oikkonen
eefb5e50b3
De-inline pred_filtered_dc functions, shouldn't make much difference though
2019-12-12 17:30:00 +02:00
Pauli Oikkonen
169314de4f
32x32 filtered DC prediction in AVX2
2019-12-11 18:17:06 +02:00
Pauli Oikkonen
fb2481b7e4
16x16 filtered DC implemented in AVX2
2019-12-10 15:54:50 +02:00
Pauli Oikkonen
da370ea36d
Implement AVX2 8x8 filtered DC algorithm
2019-11-28 14:10:10 +02:00
Pauli Oikkonen
5d9b7019ca
Implement a 4x4 filtered DC pred function
2019-11-26 17:05:54 +02:00
Pauli Oikkonen
f1485ab087
Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?
2019-11-25 15:20:29 +02:00
Marko Viitanen
eb2caf9118
Fix intra angle filter, changed from gauss filter table to run-time calculated 4-tap filter
2019-11-19 15:15:21 +02:00
Pauli Oikkonen
979d66031c
Create a strategy out of intra_pred_filtered_dc
2019-11-19 14:50:31 +02:00
Marko Viitanen
466d8772b0
Apply JVET_P0170_ZERO_POS_SIMPLIFICATION in coeff bypass coding
2019-11-19 14:32:38 +02:00
Pauli Oikkonen
fa4bb86406
Optimize intra_pred_planar_avx2 for 4x4 blocks
2019-11-19 13:39:02 +02:00
Marko Viitanen
17a53230fd
Code cleanup, remove unused arrays and remove tabs
2019-11-18 09:01:23 +02:00
Pauli Oikkonen
4761d228f9
Start to vectorize the 4x4 loop
2019-11-15 17:32:40 +02:00
Pauli Oikkonen
8d45ab4951
Stupidify the 4x4 planar loop for vectorization
2019-11-14 17:14:04 +02:00
Pauli Oikkonen
6d7a4f555c
Also remove 16x16 (A * B^T)^T matrix multiply
...
Can be done using (B * A^T) instead, it's the exact same
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
2c2deb2366
Tidy AVX2 32x32 matrix multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
98ad78b333
Tidy the old AVX2 32x32 matrix multiply
...
It was actually a very good algorithm, just looked messy!
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
4a921cbdb5
Retain data as much in YMM registers as possible
...
This seems to make it a whole lot quicker
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ac4d710e23
Unroll 32x32 matrix multiply, use all regs
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
a58608d0b8
Remove totally unnecessary (A * B^T)^T 32x32 multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
043f53539f
Implement a streamlined matrix-multiply 32x32 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e9da2d851b
Tidy 32x32 fast DCT's helper functions
2019-10-28 16:19:42 +02:00