Marko Viitanen
574c4d06ee
Fix use of log2_cg_size in coeff coding -> smaller blocks also decoded correctly
2020-08-27 18:26:16 +03:00
Marko Viitanen
20b66c9949
Sync to VTM 8.2 and add separate height to last_sig coding
2020-04-29 08:52:38 +03:00
Jan Beich
1fa69c705d
Rename truncate() from 30ce461d98
to avoid conflict with POSIX version
...
strategies/avx2/dct-avx2.c:55:23: error: static declaration of 'truncate' follows non-static declaration
static INLINE __m256i truncate(__m256i v, __m256i debias, int32_t shift)
^
/usr/include/stdio.h:448:6: note: previous declaration is here
int truncate(const char *, __off_t);
^
2020-04-22 16:09:42 +00:00
Marko Viitanen
86d76b19a4
Fix intra neighboring block selection and clean some unused code
2020-04-16 14:12:40 +03:00
Ari Lemmetti
f31dddc019
Bypass inverse quantization and inverse transform when trying early skip
2020-04-10 16:02:09 +03:00
Pauli Oikkonen
8617530b13
Use _mm_store_epi64 instead of _mm_cvtsi128_si64
...
Fix 32-bit builds that tend to lack the cvt intrinsic. Hope it will be
optimized to a movq r64, xmm on modern platforms though
2020-04-07 23:51:54 +03:00
Pauli Oikkonen
a82966c0f5
Fix lacking _mm256_cvtss_f32 intrinsic on VS
...
Cast __m256 into __m128 first, the XMM variant of the intrinsic has been
around for a long enough time to be supported
2020-04-07 22:38:10 +03:00
Ari Lemmetti
901c25c0c8
Merge branch 'vaq'
2020-04-03 19:51:17 +03:00
Ari Lemmetti
51451be5ef
Handle cases where the number of pixels is not divisible by 32
2020-04-03 19:37:47 +03:00
siivonek
e5267f7706
Fix define for use with Visual Studio.
2020-04-03 15:11:01 +02:00
Pauli Oikkonen
addc1c3ede
Fix warning about potentially unused hsum_8x32b
...
There's a lot of alternative options available, such as making it
globally visible with a kvz_ prefix, force inlining it, or anything.
This could be good too, hope it won't be compiled at all to translation
units where it's not used.
2020-04-02 16:44:22 +03:00
siivonek
566680af7b
Move function hsum to file where it is used to avoid errors.
2020-04-02 14:03:06 +02:00
siivonek
58be514e2a
Fix pipeline error.
2020-04-02 13:50:08 +02:00
Pauli Oikkonen
99889dab15
Fix switch(bool) in picture-avx2.c
...
It passes on GCC but warns on Clang
2020-03-31 15:42:19 +03:00
Jaakko Laitinen
af3d559d8d
Let pu-depth be defined per gop-layer
2020-03-17 17:57:18 +02:00
Pauli Oikkonen
60e7956dc5
Disable inaccurate integer variance calculation for now
2020-03-02 19:18:55 +02:00
Pauli Oikkonen
fc1b91335b
Implement variance calculation in integer math
...
Maybe this is a bit faster than FP, it's not accurate though
2020-03-02 18:17:18 +02:00
Pauli Oikkonen
35c825c75f
Move hsum_8x32b to avx2_common_functions
2020-02-27 17:52:17 +02:00
Pauli Oikkonen
b00ac7d1c4
AVX2 version of buffer variance calculation
2020-02-25 15:57:56 +02:00
Pauli Oikkonen
1bd9c6dd93
Make a strategy out of pixel_var
2020-02-24 19:37:36 +02:00
Ari Lemmetti
3c7dd0752f
Remove the broken "no mov" branch.
...
Causes hash mismatches for example in SlideShow sequence.
2020-02-03 15:26:31 +02:00
RLamm
30d5df40c5
Custom headers for the distributed coding
2020-01-29 15:54:49 +02:00
Pauli Oikkonen
c3d9e97e9f
Fix VS build
2019-12-12 18:34:55 +02:00
Pauli Oikkonen
7f238ca299
Remove debug print functions
...
Whoops
2019-12-12 18:19:31 +02:00
Pauli Oikkonen
eefb5e50b3
De-inline pred_filtered_dc functions, shouldn't make much difference though
2019-12-12 17:30:00 +02:00
Pauli Oikkonen
169314de4f
32x32 filtered DC prediction in AVX2
2019-12-11 18:17:06 +02:00
Pauli Oikkonen
fb2481b7e4
16x16 filtered DC implemented in AVX2
2019-12-10 15:54:50 +02:00
Pauli Oikkonen
da370ea36d
Implement AVX2 8x8 filtered DC algorithm
2019-11-28 14:10:10 +02:00
Pauli Oikkonen
5d9b7019ca
Implement a 4x4 filtered DC pred function
2019-11-26 17:05:54 +02:00
Pauli Oikkonen
f1485ab087
Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?
2019-11-25 15:20:29 +02:00
Marko Viitanen
eb2caf9118
Fix intra angle filter, changed from gauss filter table to run-time calculated 4-tap filter
2019-11-19 15:15:21 +02:00
Pauli Oikkonen
979d66031c
Create a strategy out of intra_pred_filtered_dc
2019-11-19 14:50:31 +02:00
Marko Viitanen
466d8772b0
Apply JVET_P0170_ZERO_POS_SIMPLIFICATION in coeff bypass coding
2019-11-19 14:32:38 +02:00
Pauli Oikkonen
fa4bb86406
Optimize intra_pred_planar_avx2 for 4x4 blocks
2019-11-19 13:39:02 +02:00
Marko Viitanen
17a53230fd
Code cleanup, remove unused arrays and remove tabs
2019-11-18 09:01:23 +02:00
Pauli Oikkonen
4761d228f9
Start to vectorize the 4x4 loop
2019-11-15 17:32:40 +02:00
Pauli Oikkonen
8d45ab4951
Stupidify the 4x4 planar loop for vectorization
2019-11-14 17:14:04 +02:00
Pauli Oikkonen
6d7a4f555c
Also remove 16x16 (A * B^T)^T matrix multiply
...
Can be done using (B * A^T) instead, it's the exact same
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
2c2deb2366
Tidy AVX2 32x32 matrix multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
98ad78b333
Tidy the old AVX2 32x32 matrix multiply
...
It was actually a very good algorithm, just looked messy!
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
4a921cbdb5
Retain data as much in YMM registers as possible
...
This seems to make it a whole lot quicker
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ac4d710e23
Unroll 32x32 matrix multiply, use all regs
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
a58608d0b8
Remove totally unnecessary (A * B^T)^T 32x32 multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
043f53539f
Implement a streamlined matrix-multiply 32x32 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e9da2d851b
Tidy 32x32 fast DCT's helper functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e382339182
Implement fast (butterfly) 32x32 DCT in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
b5962dadac
Tidy indentation in AVX2 16x16 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
36a8f89025
Fine-tune 16x16 AVX2 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ca9409de2b
Implement 16x16 DCT as butterfly algorithm in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7c69a26717
Use aligned loads and stores for AVX2 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e9c65dca6
Align DCT matrices and temp transform buffers
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
148a150522
Align DCT source and dest blocks to cache line
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e60bbf6a6
Slightly tune 16x16 forward DCT
...
Use an array of __m256i's to store temporary value, essentially letting
the compiler enforce alignment and use aligned loads and stores.
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
c0cc0e8a75
Optimize 16x16 multiply by only slicing right mat once
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e463d27f22
Implement streamlined generic 16x16 matrix multiply
...
It can't be this fast for real, can it?
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
beb85ce9d6
Reorder parameters for 8x8 matrix multiplies
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
292af62256
Implement tailored 16x16 forward DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
30ce461d98
Redo 4x4 matrix multiplication
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
07970ea82f
Streamline by-the-book 8x8 matrix multiplication
...
Also chop up the forward transform into two tailored multiply functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7ec7ab3361
Implement a tailored AVX2 8x8 DCT
2019-10-28 16:19:42 +02:00
pkubaj
1d7fcf4227
Fix build on powerpc64 with LLVM
2019-09-12 15:05:00 +02:00
Pauli Oikkonen
99597b828a
Work around the ancient Win32 calling convention hassle
...
See if this'll work now
2019-09-06 13:14:42 +03:00
Pauli Oikkonen
c5ca18950c
Revert "Revert to 6924d90052
due to broken visual studio build"
...
This reverts commit 1dd0619bd7
.
2019-09-05 18:21:55 +03:00
Pauli Oikkonen
55529decd5
Implement _mm256_insert_epi32 and extract pseudo-ops
...
Visual Studio headers apparently lack these guys
2019-09-05 18:20:52 +03:00
Ari Lemmetti
557bcbc6aa
Make luma or chroma only inter "recon" or predict possible
2019-09-02 17:15:28 +03:00
RLamm
60be6d411c
Intra filtering fixed at least for luma. All intra modes output valid luma (hashes match), but chroma is still broken.
2019-08-30 16:14:00 +03:00
Marko Viitanen
cb0d7c340a
Use the new PDPC filtering in angular intra
2019-08-23 14:44:41 +03:00
Marko Viitanen
5bebb18943
Change intra filtering according to VTM6
2019-08-23 08:56:35 +03:00
Marko Viitanen
a16efe6b52
Merge remote-tracking branch 'remotes/github_kvazaar/master'
...
# Conflicts:
# build/kvazaar_VS2013.sln
# build/kvazaar_VS2015.sln
# build/kvazaar_VS2017.sln
# build/kvazaar_cli/kvazaar_cli.vcxproj
# build/kvazaar_lib/kvazaar_lib.vcxproj
# build/kvazaar_tests/kvazaar_tests.vcxproj
# src/encode_coding_tree.c
# src/encode_coding_tree.h
# src/encoder_state-bitstream.c
# src/inter.c
# src/strategies/avx2/quant-avx2.c
2019-08-22 15:12:01 +03:00
Ari Lemmetti
1dd0619bd7
Revert to 6924d90052
due to broken visual studio build
2019-08-08 15:15:34 +03:00
Pauli Oikkonen
2852baa673
Separate sign3_diff_epu8 from calc_eo_cat
...
Just to keep things simple, clear and obvious
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a858e7dd4b
Combine duplicate code into inline functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
de0e97f711
Take 8/16/24b loads and stores into separate functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
10979f58fe
Tidy up code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
9cc11976c0
Combine the delta accumulation from edge and band ddistortion into shared func
...
This won't reduce object size, but there'll be less duplicate code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
55d877bd66
Vectorize sao_edge_ddistortion
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
aef0f301d3
Fix function signatures
...
Mark anything intended as read-only to be const, and fix alignment
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
997fd369b3
Redo calc_sao_edge_dir_avx2
...
Do it wider, 32 pixels at once!
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
db1e475e02
Use i32 instead of i8 for x/y offsets
...
Doesn't matter too much, because this number isn't used in SIMD
computation, only as a memory reference offset.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
12de466ef5
Reimplement non-band SAO color reconstruction in AVX2
...
Streamline things to work on 32 pixels at once instead of 8
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
e8bff99329
Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction
...
Vectorize it all, hope this helps with perf
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
7b5dffa855
Implement calc_sao_offset_array in AVX2
...
To be efficient, the AVX2 color reconstruction algorithm will need
offsets in byte, not dword, arrays. This is completely specific to 8-bit
pixels and the function signature is fundamentally distinct from the
generic algorithm, so it's better to not strategize SAO offset array
calculation.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
08881f5e9b
(TEMP) (TODO) (whatever) Avoid compiler warnings
...
I want the CI to not crash on its -Wall -Werror, but instead to actually
build the thing and report me about actual memory errors etc
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
c18adc5ee0
Redo sao_band_ddistortion_avx2
...
Avoid branching and do the entire thing on 32 pixels at once in YMMs.
Also make the sao_bands function parameter const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
1bb9a079a8
Fix indentation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7bc959c7c5
3 sao functions are now working
2019-08-07 16:35:24 +03:00
Reima Hyvönen
0e0f2d3490
made to clear sum vector after it has been set to memory
2019-08-07 16:35:24 +03:00
Reima Hyvönen
f146de7acb
removed some variables to prevent memory losses
2019-08-07 16:35:24 +03:00
Reima Hyvönen
247c3a7a71
conversed gined to unsigned int
2019-08-07 16:35:24 +03:00
Reima Hyvönen
ac5c216974
Some more memory error preventing to sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3fb1cbca35
more editing sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
afbb6fb960
some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3496a57f7a
Edited sao_edge_ddistortion_avx2 to avoid memory overflow
2019-08-07 16:35:24 +03:00
Reima Hyvönen
267ba1d6ce
Modified sao_band_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
e70663b245
added some sub commands to avoid memory read errors
2019-08-07 16:35:24 +03:00
Reima Hyvönen
59dfb4570c
Converted some loads to load int8_t instead ints
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8b253209a8
Found false address load from calc_sao_edge_dir. Should now work like generic
2019-08-07 16:35:24 +03:00
Reima Hyvönen
50e0a47b7a
Took away __restrict
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8a39eb674e
Removed c-variable from calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bc0a36830d
Clerified some 6 pixel loads
2019-08-07 16:35:24 +03:00