Ari Lemmetti
3c7dd0752f
Remove the broken "no mov" branch.
...
Causes hash mismatches for example in SlideShow sequence.
2020-02-03 15:26:31 +02:00
RLamm
30d5df40c5
Custom headers for the distributed coding
2020-01-29 15:54:49 +02:00
Pauli Oikkonen
c3d9e97e9f
Fix VS build
2019-12-12 18:34:55 +02:00
Pauli Oikkonen
7f238ca299
Remove debug print functions
...
Whoops
2019-12-12 18:19:31 +02:00
Pauli Oikkonen
eefb5e50b3
De-inline pred_filtered_dc functions, shouldn't make much difference though
2019-12-12 17:30:00 +02:00
Pauli Oikkonen
169314de4f
32x32 filtered DC prediction in AVX2
2019-12-11 18:17:06 +02:00
Pauli Oikkonen
fb2481b7e4
16x16 filtered DC implemented in AVX2
2019-12-10 15:54:50 +02:00
Pauli Oikkonen
da370ea36d
Implement AVX2 8x8 filtered DC algorithm
2019-11-28 14:10:10 +02:00
Pauli Oikkonen
5d9b7019ca
Implement a 4x4 filtered DC pred function
2019-11-26 17:05:54 +02:00
Pauli Oikkonen
f1485ab087
Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?
2019-11-25 15:20:29 +02:00
Pauli Oikkonen
fa4bb86406
Optimize intra_pred_planar_avx2 for 4x4 blocks
2019-11-19 13:39:02 +02:00
Marko Viitanen
17a53230fd
Code cleanup, remove unused arrays and remove tabs
2019-11-18 09:01:23 +02:00
Pauli Oikkonen
4761d228f9
Start to vectorize the 4x4 loop
2019-11-15 17:32:40 +02:00
Pauli Oikkonen
8d45ab4951
Stupidify the 4x4 planar loop for vectorization
2019-11-14 17:14:04 +02:00
Pauli Oikkonen
6d7a4f555c
Also remove 16x16 (A * B^T)^T matrix multiply
...
Can be done using (B * A^T) instead, it's the exact same
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
2c2deb2366
Tidy AVX2 32x32 matrix multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
98ad78b333
Tidy the old AVX2 32x32 matrix multiply
...
It was actually a very good algorithm, just looked messy!
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
4a921cbdb5
Retain data as much in YMM registers as possible
...
This seems to make it a whole lot quicker
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ac4d710e23
Unroll 32x32 matrix multiply, use all regs
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
a58608d0b8
Remove totally unnecessary (A * B^T)^T 32x32 multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
043f53539f
Implement a streamlined matrix-multiply 32x32 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e9da2d851b
Tidy 32x32 fast DCT's helper functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e382339182
Implement fast (butterfly) 32x32 DCT in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
b5962dadac
Tidy indentation in AVX2 16x16 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
36a8f89025
Fine-tune 16x16 AVX2 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ca9409de2b
Implement 16x16 DCT as butterfly algorithm in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7c69a26717
Use aligned loads and stores for AVX2 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e9c65dca6
Align DCT matrices and temp transform buffers
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
148a150522
Align DCT source and dest blocks to cache line
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e60bbf6a6
Slightly tune 16x16 forward DCT
...
Use an array of __m256i's to store temporary value, essentially letting
the compiler enforce alignment and use aligned loads and stores.
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
c0cc0e8a75
Optimize 16x16 multiply by only slicing right mat once
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e463d27f22
Implement streamlined generic 16x16 matrix multiply
...
It can't be this fast for real, can it?
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
beb85ce9d6
Reorder parameters for 8x8 matrix multiplies
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
292af62256
Implement tailored 16x16 forward DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
30ce461d98
Redo 4x4 matrix multiplication
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
07970ea82f
Streamline by-the-book 8x8 matrix multiplication
...
Also chop up the forward transform into two tailored multiply functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7ec7ab3361
Implement a tailored AVX2 8x8 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
99597b828a
Work around the ancient Win32 calling convention hassle
...
See if this'll work now
2019-09-06 13:14:42 +03:00
Pauli Oikkonen
c5ca18950c
Revert "Revert to 6924d90052
due to broken visual studio build"
...
This reverts commit 1dd0619bd7
.
2019-09-05 18:21:55 +03:00
Pauli Oikkonen
55529decd5
Implement _mm256_insert_epi32 and extract pseudo-ops
...
Visual Studio headers apparently lack these guys
2019-09-05 18:20:52 +03:00
Ari Lemmetti
557bcbc6aa
Make luma or chroma only inter "recon" or predict possible
2019-09-02 17:15:28 +03:00
Marko Viitanen
a16efe6b52
Merge remote-tracking branch 'remotes/github_kvazaar/master'
...
# Conflicts:
# build/kvazaar_VS2013.sln
# build/kvazaar_VS2015.sln
# build/kvazaar_VS2017.sln
# build/kvazaar_cli/kvazaar_cli.vcxproj
# build/kvazaar_lib/kvazaar_lib.vcxproj
# build/kvazaar_tests/kvazaar_tests.vcxproj
# src/encode_coding_tree.c
# src/encode_coding_tree.h
# src/encoder_state-bitstream.c
# src/inter.c
# src/strategies/avx2/quant-avx2.c
2019-08-22 15:12:01 +03:00
Ari Lemmetti
1dd0619bd7
Revert to 6924d90052
due to broken visual studio build
2019-08-08 15:15:34 +03:00
Pauli Oikkonen
2852baa673
Separate sign3_diff_epu8 from calc_eo_cat
...
Just to keep things simple, clear and obvious
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a858e7dd4b
Combine duplicate code into inline functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
de0e97f711
Take 8/16/24b loads and stores into separate functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
10979f58fe
Tidy up code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
9cc11976c0
Combine the delta accumulation from edge and band ddistortion into shared func
...
This won't reduce object size, but there'll be less duplicate code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
55d877bd66
Vectorize sao_edge_ddistortion
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
aef0f301d3
Fix function signatures
...
Mark anything intended as read-only to be const, and fix alignment
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
997fd369b3
Redo calc_sao_edge_dir_avx2
...
Do it wider, 32 pixels at once!
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
db1e475e02
Use i32 instead of i8 for x/y offsets
...
Doesn't matter too much, because this number isn't used in SIMD
computation, only as a memory reference offset.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
12de466ef5
Reimplement non-band SAO color reconstruction in AVX2
...
Streamline things to work on 32 pixels at once instead of 8
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
e8bff99329
Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction
...
Vectorize it all, hope this helps with perf
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
7b5dffa855
Implement calc_sao_offset_array in AVX2
...
To be efficient, the AVX2 color reconstruction algorithm will need
offsets in byte, not dword, arrays. This is completely specific to 8-bit
pixels and the function signature is fundamentally distinct from the
generic algorithm, so it's better to not strategize SAO offset array
calculation.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
08881f5e9b
(TEMP) (TODO) (whatever) Avoid compiler warnings
...
I want the CI to not crash on its -Wall -Werror, but instead to actually
build the thing and report me about actual memory errors etc
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
c18adc5ee0
Redo sao_band_ddistortion_avx2
...
Avoid branching and do the entire thing on 32 pixels at once in YMMs.
Also make the sao_bands function parameter const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
1bb9a079a8
Fix indentation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7bc959c7c5
3 sao functions are now working
2019-08-07 16:35:24 +03:00
Reima Hyvönen
0e0f2d3490
made to clear sum vector after it has been set to memory
2019-08-07 16:35:24 +03:00
Reima Hyvönen
f146de7acb
removed some variables to prevent memory losses
2019-08-07 16:35:24 +03:00
Reima Hyvönen
247c3a7a71
conversed gined to unsigned int
2019-08-07 16:35:24 +03:00
Reima Hyvönen
ac5c216974
Some more memory error preventing to sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3fb1cbca35
more editing sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
afbb6fb960
some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3496a57f7a
Edited sao_edge_ddistortion_avx2 to avoid memory overflow
2019-08-07 16:35:24 +03:00
Reima Hyvönen
267ba1d6ce
Modified sao_band_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
e70663b245
added some sub commands to avoid memory read errors
2019-08-07 16:35:24 +03:00
Reima Hyvönen
59dfb4570c
Converted some loads to load int8_t instead ints
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8b253209a8
Found false address load from calc_sao_edge_dir. Should now work like generic
2019-08-07 16:35:24 +03:00
Reima Hyvönen
50e0a47b7a
Took away __restrict
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8a39eb674e
Removed c-variable from calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bc0a36830d
Clerified some 6 pixel loads
2019-08-07 16:35:24 +03:00
Reima Hyvönen
1a8b211e05
Added break to line 170
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d05e750ebe
Added some switches to prevent segmentation fault from reading
2019-08-07 16:35:24 +03:00
Reima Hyvönen
203580047d
Defined some AVX functions
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c884c738b1
Updated some commands to match the standard
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b412ed2f59
Removed some setr and used loads calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c6cc063534
converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract
2019-08-07 16:35:24 +03:00
Reima Hyvönen
47ac109b10
optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND
2019-08-07 16:35:24 +03:00
Reima Hyvönen
96dc60a1ed
first working optimation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c148aff9fb
Some optimation done to function sao_reconstruct_color_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bf16ba6cc4
Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
79dc39a676
Some editing for sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
06ee52924e
some reconst done to calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
5fbc65d823
reconst optimation doesn't work yet
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d29f834a69
Remove useless function
2019-08-07 16:35:24 +03:00
Reima Hyvönen
a232a12160
calc_sao_edge_dir_avx2 updated
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b1febc02a5
sao_edge_ddistortion_avx2 now working proberly
2019-08-07 16:35:24 +03:00
Reima Hyvönen
cd6092a1ec
Still too much bits, looking for where they appear
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7853be8eeb
Incomple optimation
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
8d48bee180
Tidy fast coeff cost code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
201a43b08e
Clean up the RD-estimation code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
b111df5073
Create preliminary version of improved cost estimator
2019-07-09 18:01:54 +03:00
Marko Viitanen
8280bd3217
Add channel info to angular_pred and fix the displacement tables.
...
Also includes 4-tap intra filtering code commented out
2019-07-04 09:35:47 +03:00
Pauli Oikkonen
081d16fc33
Fix intrinsics that may be missing on some systems
...
Create a header to collect all the workarounds for missing intrinsics
in one place
2019-05-23 19:59:40 +03:00
Marko Viitanen
30a8a7b97c
WIP fixing the last significant xy coding
2019-05-07 15:01:02 +03:00
Pauli Oikkonen
7175d20bb2
Still include stdint.h for non-vector builds
2019-04-15 19:36:01 +03:00
Pauli Oikkonen
1315c7e2b0
Do not compile any vector code for non-SSE4/AVX2 builds
2019-04-15 19:10:48 +03:00
Pauli Oikkonen
f5f70e7bc5
Merge branch 'sad-optimization'
2019-04-15 19:02:01 +03:00