Jaakko Laitinen
8bd1a2b667
Update help message
2020-03-31 13:19:05 +03:00
Jaakko Laitinen
b4f5486190
Set intra qp offset default to auto
2020-03-31 12:58:40 +03:00
Jaakko Laitinen
740688c67d
Add auto option to intra qp offset
2020-03-31 11:56:44 +03:00
Pauli Oikkonen
0c7bfa7dc9
Fix AVX2 on Clang
...
Besides just -mavx2, AVX2 support depends on a couple minor instruction
set extensions that should always exist on AVX2-capable hardware. Too
bad the different bit twiddling instructions are invoked slightly
differently between GCC and Clang, but now Clang seems to also produce
an AVX2-capable build.
2020-03-26 18:48:48 +02:00
siivonek
89d3e674ce
Comment out code which possible messes up OBA
2020-03-26 17:49:31 +02:00
siivonek
be7d9ddec5
Fix error in frame variance calculation. Chroma channels were not added to variance
2020-03-26 14:33:00 +02:00
Jaakko Laitinen
45ca8f8113
Merge branch 'master' into 'extended_pu-depths'
2020-03-25 15:11:08 +02:00
siivonek
5986e71535
Fix mistake
2020-03-20 13:43:44 +02:00
Jaakko Laitinen
d6ffe9e495
Update docs
2020-03-20 13:27:07 +02:00
Jaakko Laitinen
621450cc1d
Update --help
2020-03-20 13:07:48 +02:00
Jaakko Laitinen
aaac3df69b
Add prefix to kvazaar.h define
2020-03-20 09:04:00 +02:00
siivonek
2a85be5752
Move qp_to_lambda so it is defined before use. Change some tabs to spaces
2020-03-19 22:13:53 +02:00
siivonek
0a4ce3c0aa
Add vaq to new rate control
2020-03-19 21:43:52 +02:00
siivonek
1bbc598d75
Merge branch 'master' into vaq
2020-03-19 20:19:43 +02:00
Joose Sainio
b53911d637
Merge branch 'rc-intra'
2020-03-19 13:34:15 +02:00
Joose Sainio
a304a8ea6e
Add weights for GOP 16 based on fitting a power curve to bits spent by HM
2020-03-19 11:13:43 +02:00
Joose Sainio
e823ac1dae
miscellaneous fixes
...
- bump library version
- add help desk for --clip-neighbour
- update the default values of --clip-neighbour and --intra-bits
- update tests to more sensible
2020-03-19 10:47:28 +02:00
Jaakko Laitinen
b2ddba38c2
Set correct size for pu-depth min/max data structure
2020-03-19 09:29:43 +02:00
Joose Sainio
2c345bc3cf
try to fix tsan issue
2020-03-18 14:58:54 +02:00
Jaakko Laitinen
fe428dcbe1
Fix no gop functionality
2020-03-18 11:03:33 +02:00
Jaakko Laitinen
af3d559d8d
Let pu-depth be defined per gop-layer
2020-03-17 17:57:18 +02:00
Ari Lemmetti
cbd77944d8
Costs in rough intra search may be negative. Get rid of UBSan error.
2020-03-16 22:13:14 +02:00
Ari Lemmetti
aa0ade3f65
Cast values to unsigned to make UBSan not trigger due to left-shifting negatives
2020-03-16 19:52:34 +02:00
RLamm
27fe716654
Fixed reference POC indexing
2020-03-11 15:33:37 +02:00
RLamm
bf24831780
Attempt to fix random crashes
2020-03-11 15:31:47 +02:00
RLamm
887659db1f
Attempted to scale the extra_mvs
2020-03-11 15:31:46 +02:00
siivonek
8d9719ff90
Merge branch 'master' into vaq
2020-03-05 14:17:01 +02:00
Joose Sainio
c9a8f2a596
Completely disable intra based model for frame 1
2020-03-04 12:52:13 +02:00
Joose Sainio
19c79c3e58
don't use the intra frame based estimation if the result is bad
2020-03-04 09:26:22 +02:00
Ari Lemmetti
7b7358c25a
Update presets veryslow and placebo a bit
...
Both use now --gop 16, --intra-qp-offset -3, --me tz, and --transform-skip
2020-03-03 20:41:01 +02:00
Pauli Oikkonen
60e7956dc5
Disable inaccurate integer variance calculation for now
2020-03-02 19:18:55 +02:00
Pauli Oikkonen
fc1b91335b
Implement variance calculation in integer math
...
Maybe this is a bit faster than FP, it's not accurate though
2020-03-02 18:17:18 +02:00
Pauli Oikkonen
35c825c75f
Move hsum_8x32b to avx2_common_functions
2020-02-27 17:52:17 +02:00
Pauli Oikkonen
b00ac7d1c4
AVX2 version of buffer variance calculation
2020-02-25 15:57:56 +02:00
siivonek
a380e43bda
Add chroma channels to variance calculation.
2020-02-24 19:54:34 +02:00
Pauli Oikkonen
1bd9c6dd93
Make a strategy out of pixel_var
2020-02-24 19:37:36 +02:00
Pauli Oikkonen
86ebf366e1
fix typo
2020-02-24 18:18:10 +02:00
Joose Sainio
f81de41775
Merge branch 'master' into rc-intra
2020-02-24 15:30:57 +02:00
siivonek
5688bcd646
Merge branch 'master' into vaq
2020-02-21 17:11:10 +02:00
siivonek
908ecb1767
Add rounding to aq offsets. Fix typo
2020-02-21 13:51:43 +02:00
Ari Lemmetti
1dfc69b42e
Consider merge index bits in merge analysis and early skip
2020-02-20 09:43:58 +02:00
Joose Sainio
7deb22c8e8
Merge branch 'master' into rc-intra
2020-02-19 15:01:04 +02:00
Kari Siivonen (TAU)
c972ca9067
Add assert to check if deltaQP out of bounds. Clip adaptive QP to [-13, 12].
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
f07990794f
Fix error in vaq pixel blit range calculation
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
57ed40c263
Fix application of aq offset
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
be2f420d61
Change: vaq requires parameter. Parameter defines vaq strength ex. 15 == 1.5
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
bf1b2c1e22
Add define for vaq strength parameter
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
150559a7e8
Fix bugs. Enable set_qp_in_cu when using vaq
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
c8c71274ee
Change tabs to spaces.
2020-02-18 13:20:26 +02:00
siivonek
888382953d
Implement calculation of vaq values. Values not used yet.
2020-02-18 13:20:25 +02:00
siivonek
ad40a88c09
Add no-vaq option to vaq
2020-02-18 13:20:25 +02:00
siivonek
09f0a1c52e
Fix typo in comment
2020-02-18 13:20:25 +02:00
siivonek
84fb3fd7d1
aq: Add --vaq commandline option
2020-02-18 13:20:25 +02:00
Joose Sainio
2a98f5db1e
fix intra-bits for lp-gop
2020-02-18 10:38:29 +02:00
Ari Lemmetti
71d9327f62
Further improve fast bipred
2020-02-17 20:32:52 +02:00
Ari Lemmetti
80c26870d5
Update docs
2020-02-15 23:29:18 +02:00
Ari Lemmetti
ebb183cc01
Add option to make intra QP offset configurable
2020-02-15 22:54:48 +02:00
Ari Lemmetti
be3e08d6db
Add gop.h to Makefile
2020-02-15 22:54:47 +02:00
Ari Lemmetti
1354acd358
Prevent negative values being written to SPS with --gop=0
2020-02-15 22:54:47 +02:00
Ari Lemmetti
fe4869916c
Disable GOP and intra qp offset for all-intra coding automatically
2020-02-15 22:54:46 +02:00
Ari Lemmetti
9849fb7c77
Enable experimental rate control for GOP 16
2020-02-15 22:54:46 +02:00
Ari Lemmetti
a0a22dec8a
Remove deprecated / unused lambda adjustments
2020-02-15 22:54:46 +02:00
Arttu Ylä-Outinen
829a70e6a7
Copy lowdelay GOP definition from HM
2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen
28f99c0b87
Change definition of 8-GOP to match HM
2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen
636fa8fbdd
Fix maximum decoded picture buffer size
2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen
ebd5156db5
Add definition for random access GOP of length 16
2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen
6653f06dd0
Only compute GOP layer weights when RC is enabled
2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen
c8fff1e0d6
Use a larger number of bits for POC lsb when needed
...
Changes the number of bits used for coding the least significant bits of
the POC based on the GOP size.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen
d757a832c2
Change GOP QP offset handling to match HM
...
Adds fields qp_model_scale and qp_model_offset to kvz_gop_config and
intra_qp_offset to kvz_config.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen
f37dcd5879
Move GOP definition to a separate file
...
Moves definition of the 8-GOP from cfg.c to gop.h.
2020-02-15 22:36:55 +02:00
Ari Lemmetti
6e1007a3e7
Get rid of LAMBA! (Commit #3000 )
2020-02-15 22:32:52 +02:00
Ari Lemmetti
0c02e71b43
Remove minor error from readme
2020-02-15 22:29:08 +02:00
Joose Sainio
e90d3141a2
Merge branch 'master' into rc-intra
2020-02-05 11:06:56 +02:00
Ari Lemmetti
9a0236bb4e
Add option 'zero-coeff-rdo'
2020-02-04 21:26:29 +02:00
Ari Lemmetti
886ff36d12
Initial implementation of fast bipred.
2020-02-04 15:46:23 +02:00
Ari Lemmetti
3c7dd0752f
Remove the broken "no mov" branch.
...
Causes hash mismatches for example in SlideShow sequence.
2020-02-03 15:26:31 +02:00
RLamm
bf8941ddb8
Added comment about partial-coding usage
2020-01-31 16:19:48 +02:00
RLamm
b8488ab48d
Changed "partial-coding" variables to uint32_t
2020-01-31 16:02:29 +02:00
RLamm
76e3249754
Changed parameter "slicer" to "partial-coding" to avoid confusion.
2020-01-31 14:22:32 +02:00
RLamm
30d5df40c5
Custom headers for the distributed coding
2020-01-29 15:54:49 +02:00
Joose Sainio
54571529a4
Fix accessing previous frame that didn't exist
2020-01-17 10:48:35 +02:00
Joose Sainio
5c671d20e1
Use the new clipping only in situations where it actually helps
2020-01-17 09:08:21 +02:00
Joose Sainio
3c34d7c863
Fix qp estimation and checking of previous frames that dont exist
2020-01-15 09:32:04 +02:00
Joose Sainio
1a35c22a52
Change clipping of lambda and qp for ctus on OBA rc
...
instead of clipping qp and lambda to the value of last value from the state
clip to previous frame with same layer and if such frame doesn't exist, clip
to previous frame
2020-01-14 14:46:05 +02:00
Pauli Oikkonen
c3d9e97e9f
Fix VS build
2019-12-12 18:34:55 +02:00
Pauli Oikkonen
7f238ca299
Remove debug print functions
...
Whoops
2019-12-12 18:19:31 +02:00
Pauli Oikkonen
eefb5e50b3
De-inline pred_filtered_dc functions, shouldn't make much difference though
2019-12-12 17:30:00 +02:00
Pauli Oikkonen
169314de4f
32x32 filtered DC prediction in AVX2
2019-12-11 18:17:06 +02:00
Pauli Oikkonen
fb2481b7e4
16x16 filtered DC implemented in AVX2
2019-12-10 15:54:50 +02:00
Joose Sainio
b78aa7b272
save c and k to frame
2019-12-06 10:52:54 +02:00
Joose Sainio
5b10e5fb7e
parameterize the clipping option
2019-12-06 09:51:04 +02:00
Pauli Oikkonen
da370ea36d
Implement AVX2 8x8 filtered DC algorithm
2019-11-28 14:10:10 +02:00
Pauli Oikkonen
5d9b7019ca
Implement a 4x4 filtered DC pred function
2019-11-26 17:05:54 +02:00
Joose Sainio
ca0060cbba
try the original clipping
2019-11-26 15:13:04 +02:00
Pauli Oikkonen
f1485ab087
Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?
2019-11-25 15:20:29 +02:00
Joose Sainio
ab2fded8af
Update threadwrapper to enable pthread_rwlock_t
2019-11-21 13:38:40 +02:00
Joose Sainio
eb78aead1f
Fix additional potential data races
2019-11-21 11:03:12 +02:00
Joose Sainio
35d7e0d88b
Fix data race
2019-11-21 10:25:04 +02:00
Pauli Oikkonen
979d66031c
Create a strategy out of intra_pred_filtered_dc
2019-11-19 14:50:31 +02:00
Joose Sainio
0e8815a3d8
test clipping qp to previous frame instead of previous ctus
2019-11-19 14:32:31 +02:00
Joose Sainio
ddb4e5a131
move the intra bit calculation so that it is used also with lambda rc
2019-11-19 14:16:48 +02:00
Joose Sainio
a07833f3e6
check that mallocs in rc initialization were successful
...
only call kvz_update_after_picture when using the OBA rc
2019-11-19 13:59:44 +02:00
Joose Sainio
50d410a316
re-enable static qp encoding and lambda rc
2019-11-19 13:45:58 +02:00
Pauli Oikkonen
fa4bb86406
Optimize intra_pred_planar_avx2 for 4x4 blocks
2019-11-19 13:39:02 +02:00
Joose Sainio
57e5615ece
Fix incorrect intra rc calculation skipping
2019-11-19 13:25:31 +02:00
Joose Sainio
6cc3bcd87e
Command line parameters for oba rc and implementation of the usage of the intra parameter
2019-11-19 09:29:06 +02:00
Joose Sainio
eb73548af5
Encode first frame completely before starting others to enable owf
2019-11-18 09:51:37 +02:00
Pauli Oikkonen
4761d228f9
Start to vectorize the 4x4 loop
2019-11-15 17:32:40 +02:00
Pauli Oikkonen
8d45ab4951
Stupidify the 4x4 planar loop for vectorization
2019-11-14 17:14:04 +02:00
Joose Sainio
c759c138ed
Prepare the rc data structure to be shared among all frame encoders
2019-11-13 11:56:25 +02:00
Joose Sainio
cdb7c851a4
Fix weight calculation
2019-11-13 08:55:31 +02:00
Joose Sainio
b9b01f8036
WPP with threading
2019-11-12 12:12:57 +02:00
Joose Sainio
615973adca
should enable threading with wpp when owf is not used
2019-11-12 09:03:00 +02:00
Pauli Oikkonen
6f13f6525c
Merge branch 'new_prints'
2019-11-07 17:04:21 +02:00
Joose Sainio
d353f7dd1a
Disable debug prints, fix multiple bugs in the calculation
2019-11-07 15:08:57 +02:00
mercat
57e8c3ebc2
Merge branch 'ML-cplx_red_ICIP'
2019-11-07 13:25:47 +02:00
Pauli Oikkonen
558f0ec401
Mbps, not mbps
2019-11-05 18:06:00 +02:00
Pauli Oikkonen
2edf533925
Tidy the end report printing
...
Also fix a bug with non-integer target FPS
2019-11-05 17:20:00 +02:00
Joose Sainio
408fd4ccb6
Fix lambda and qp calcualtion for intra frames
...
also fixes a bug with selecting the clip neighbor lambda and clip neighbor qp
selection for inter frames
2019-11-05 10:51:39 +02:00
Pauli Oikkonen
c7313ce567
Store AVG QP information in encmain
2019-11-04 17:08:07 +02:00
Reima Hyvönen
80575c59bf
Some updates done to get right bitrate and avg QP
2019-10-31 15:56:24 +02:00
Reima Hyvönen
252bab8820
Added prints to bitrate and AVG QP
2019-10-31 15:56:24 +02:00
Pauli Oikkonen
6d7a4f555c
Also remove 16x16 (A * B^T)^T matrix multiply
...
Can be done using (B * A^T) instead, it's the exact same
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
2c2deb2366
Tidy AVX2 32x32 matrix multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
98ad78b333
Tidy the old AVX2 32x32 matrix multiply
...
It was actually a very good algorithm, just looked messy!
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
4a921cbdb5
Retain data as much in YMM registers as possible
...
This seems to make it a whole lot quicker
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ac4d710e23
Unroll 32x32 matrix multiply, use all regs
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
a58608d0b8
Remove totally unnecessary (A * B^T)^T 32x32 multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
043f53539f
Implement a streamlined matrix-multiply 32x32 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e9da2d851b
Tidy 32x32 fast DCT's helper functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e382339182
Implement fast (butterfly) 32x32 DCT in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
b5962dadac
Tidy indentation in AVX2 16x16 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
36a8f89025
Fine-tune 16x16 AVX2 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ca9409de2b
Implement 16x16 DCT as butterfly algorithm in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7c69a26717
Use aligned loads and stores for AVX2 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e9c65dca6
Align DCT matrices and temp transform buffers
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
148a150522
Align DCT source and dest blocks to cache line
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e60bbf6a6
Slightly tune 16x16 forward DCT
...
Use an array of __m256i's to store temporary value, essentially letting
the compiler enforce alignment and use aligned loads and stores.
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
c0cc0e8a75
Optimize 16x16 multiply by only slicing right mat once
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e463d27f22
Implement streamlined generic 16x16 matrix multiply
...
It can't be this fast for real, can it?
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
beb85ce9d6
Reorder parameters for 8x8 matrix multiplies
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
292af62256
Implement tailored 16x16 forward DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
30ce461d98
Redo 4x4 matrix multiplication
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
07970ea82f
Streamline by-the-book 8x8 matrix multiplication
...
Also chop up the forward transform into two tailored multiply functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7ec7ab3361
Implement a tailored AVX2 8x8 DCT
2019-10-28 16:19:42 +02:00
Joose Sainio
372934c7db
Fix division by zero
2019-10-10 16:35:56 +03:00
Joose Sainio
9bdfdeaf5c
Rest of the owl
2019-10-09 15:48:58 +03:00
Joose Sainio
1ba8525faf
WIP
2019-10-09 10:35:07 +03:00
Joose Sainio
19496d2692
?
2019-10-03 14:50:11 +03:00
Joose Sainio
4b111e339e
fix couple of bugs in the implementation, bit calculation seems still bit off
2019-10-01 15:08:39 +03:00
Joose Sainio
84615e406a
fix compiler warnings
2019-09-27 14:20:08 +03:00
Joose Sainio
14b7a75713
Call the new functions and fix bugs
2019-09-27 14:14:24 +03:00
Joose Sainio
ef74bfb182
unify naming
2019-09-27 10:16:21 +03:00
Joose Sainio
e36f481bda
qp calculation for frame
2019-09-27 09:05:40 +03:00
Joose Sainio
47019ca1cd
intra ck update
2019-09-26 16:04:53 +03:00
Joose Sainio
7c8f4da7cb
Update c and k except after first intra
2019-09-26 13:09:28 +03:00
Joose Sainio
0577d481c1
CTU level code
2019-09-25 12:12:21 +03:00
pkubaj
1d7fcf4227
Fix build on powerpc64 with LLVM
2019-09-12 15:05:00 +02:00
mercat
0de567bfa4
Fixe memory leak
2019-09-12 09:45:32 +03:00
mercat
fa116de619
Add static
2019-09-11 16:18:12 +03:00
mercat
b8753a9293
Fucking INLINE fixed
2019-09-11 16:12:07 +03:00
mercat
b855144e68
INLINE fixe
2019-09-11 16:12:07 +03:00
mercat
694337b803
Add const and more const
2019-09-11 16:12:07 +03:00
mercat
21c07638ed
Remove const into kvz_init_constraint.
2019-09-11 16:12:06 +03:00
mercat
2bca507abe
Clean version of machine learning constraint code. (ICIP paper)
2019-09-11 16:12:06 +03:00
Alexandre Mercat
0f4b7be6ee
First version of ML ICIP code for master
2019-09-11 16:12:06 +03:00
Pauli Oikkonen
99597b828a
Work around the ancient Win32 calling convention hassle
...
See if this'll work now
2019-09-06 13:14:42 +03:00
Pauli Oikkonen
c5ca18950c
Revert "Revert to 6924d90052
due to broken visual studio build"
...
This reverts commit 1dd0619bd7
.
2019-09-05 18:21:55 +03:00
Pauli Oikkonen
55529decd5
Implement _mm256_insert_epi32 and extract pseudo-ops
...
Visual Studio headers apparently lack these guys
2019-09-05 18:20:52 +03:00
Ari Lemmetti
147378e1f9
Prevent 8x4 and 4x8 bipred in merge analysis
2019-09-03 16:32:50 +03:00
Ari Lemmetti
ef1fdbf259
Separate prediction of single PU/PB from CU/CB
2019-09-03 16:32:50 +03:00
Joose Sainio
7d2737bdf6
WIP picture lambda calculation
2019-09-03 11:03:35 +03:00
Ari Lemmetti
3bc510712f
Enable merge analysis for smp and amp
2019-09-02 17:31:51 +03:00
Ari Lemmetti
557bcbc6aa
Make luma or chroma only inter "recon" or predict possible
2019-09-02 17:15:28 +03:00
Joose Sainio
131c04f65c
Fix incorrect weight for intra frame
2019-08-29 12:01:13 +03:00
Joose Sainio
8f96678d13
Fix issue with intra frames being part of gop when they shouldn't
2019-08-29 09:28:10 +03:00
Ari Lemmetti
aa8ab195d1
Compare rough cost of the best merge mode against AMVP to make mode decision
2019-08-26 22:49:09 +03:00
Ari Lemmetti
8f866ff83a
Use correct index
2019-08-26 20:10:10 +03:00
Ari Lemmetti
2343958a14
Fix transform split for small luma blocks
2019-08-24 21:50:17 +03:00
Ari Lemmetti
800fc8644d
Reset CBFs because CBFs might have been set earlier for depth earlier.
2019-08-24 21:49:33 +03:00
Ari Lemmetti
a80de22bc7
Add only different candidates to the list
2019-08-24 21:49:33 +03:00
Ari Lemmetti
45c7961412
Remove tr depth fill. It should not be needed.
2019-08-24 21:49:32 +03:00
Ari Lemmetti
ff8711aaab
Add missing logic to add valid indices to list
2019-08-24 21:49:29 +03:00
Ari Lemmetti
1dd0619bd7
Revert to 6924d90052
due to broken visual studio build
2019-08-08 15:15:34 +03:00
Pauli Oikkonen
2852baa673
Separate sign3_diff_epu8 from calc_eo_cat
...
Just to keep things simple, clear and obvious
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
17947b79ee
Add sao_shared_generics.h in Makefile.am
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a8dd6ce351
Add a note about having implemented a separate AVX2 version of SAO offset array calculation
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a858e7dd4b
Combine duplicate code into inline functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
de0e97f711
Take 8/16/24b loads and stores into separate functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
10979f58fe
Tidy up code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
9cc11976c0
Combine the delta accumulation from edge and band ddistortion into shared func
...
This won't reduce object size, but there'll be less duplicate code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
55d877bd66
Vectorize sao_edge_ddistortion
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
aef0f301d3
Fix function signatures
...
Mark anything intended as read-only to be const, and fix alignment
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
997fd369b3
Redo calc_sao_edge_dir_avx2
...
Do it wider, 32 pixels at once!
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
db1e475e02
Use i32 instead of i8 for x/y offsets
...
Doesn't matter too much, because this number isn't used in SIMD
computation, only as a memory reference offset.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
12de466ef5
Reimplement non-band SAO color reconstruction in AVX2
...
Streamline things to work on 32 pixels at once instead of 8
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
e8bff99329
Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction
...
Vectorize it all, hope this helps with perf
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
7b5dffa855
Implement calc_sao_offset_array in AVX2
...
To be efficient, the AVX2 color reconstruction algorithm will need
offsets in byte, not dword, arrays. This is completely specific to 8-bit
pixels and the function signature is fundamentally distinct from the
generic algorithm, so it's better to not strategize SAO offset array
calculation.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
29563b7039
Make kvz_calc_sao_offset_array more obvious
...
Name temporary values from array lookups etc that are referred multiple
times to, to make the behavior of the mechanism more transparent. Define
all the constant values at the beginning of the function and declare as
const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
08881f5e9b
(TEMP) (TODO) (whatever) Avoid compiler warnings
...
I want the CI to not crash on its -Wall -Werror, but instead to actually
build the thing and report me about actual memory errors etc
2019-08-07 16:35:24 +03:00