siivonek
ad40a88c09
Add no-vaq option to vaq
2020-02-18 13:20:25 +02:00
siivonek
09f0a1c52e
Fix typo in comment
2020-02-18 13:20:25 +02:00
siivonek
84fb3fd7d1
aq: Add --vaq commandline option
2020-02-18 13:20:25 +02:00
Joose Sainio
2a98f5db1e
fix intra-bits for lp-gop
2020-02-18 10:38:29 +02:00
Ari Lemmetti
71d9327f62
Further improve fast bipred
2020-02-17 20:32:52 +02:00
Ari Lemmetti
80c26870d5
Update docs
2020-02-15 23:29:18 +02:00
Ari Lemmetti
ebb183cc01
Add option to make intra QP offset configurable
2020-02-15 22:54:48 +02:00
Ari Lemmetti
be3e08d6db
Add gop.h to Makefile
2020-02-15 22:54:47 +02:00
Ari Lemmetti
1354acd358
Prevent negative values being written to SPS with --gop=0
2020-02-15 22:54:47 +02:00
Ari Lemmetti
fe4869916c
Disable GOP and intra qp offset for all-intra coding automatically
2020-02-15 22:54:46 +02:00
Ari Lemmetti
9849fb7c77
Enable experimental rate control for GOP 16
2020-02-15 22:54:46 +02:00
Ari Lemmetti
a0a22dec8a
Remove deprecated / unused lambda adjustments
2020-02-15 22:54:46 +02:00
Arttu Ylä-Outinen
829a70e6a7
Copy lowdelay GOP definition from HM
2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen
28f99c0b87
Change definition of 8-GOP to match HM
2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen
636fa8fbdd
Fix maximum decoded picture buffer size
2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen
ebd5156db5
Add definition for random access GOP of length 16
2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen
6653f06dd0
Only compute GOP layer weights when RC is enabled
2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen
c8fff1e0d6
Use a larger number of bits for POC lsb when needed
...
Changes the number of bits used for coding the least significant bits of
the POC based on the GOP size.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen
d757a832c2
Change GOP QP offset handling to match HM
...
Adds fields qp_model_scale and qp_model_offset to kvz_gop_config and
intra_qp_offset to kvz_config.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen
f37dcd5879
Move GOP definition to a separate file
...
Moves definition of the 8-GOP from cfg.c to gop.h.
2020-02-15 22:36:55 +02:00
Ari Lemmetti
6e1007a3e7
Get rid of LAMBA! (Commit #3000 )
2020-02-15 22:32:52 +02:00
Ari Lemmetti
0c02e71b43
Remove minor error from readme
2020-02-15 22:29:08 +02:00
Joose Sainio
e90d3141a2
Merge branch 'master' into rc-intra
2020-02-05 11:06:56 +02:00
Ari Lemmetti
9a0236bb4e
Add option 'zero-coeff-rdo'
2020-02-04 21:26:29 +02:00
Ari Lemmetti
886ff36d12
Initial implementation of fast bipred.
2020-02-04 15:46:23 +02:00
Ari Lemmetti
3c7dd0752f
Remove the broken "no mov" branch.
...
Causes hash mismatches for example in SlideShow sequence.
2020-02-03 15:26:31 +02:00
RLamm
bf8941ddb8
Added comment about partial-coding usage
2020-01-31 16:19:48 +02:00
RLamm
b8488ab48d
Changed "partial-coding" variables to uint32_t
2020-01-31 16:02:29 +02:00
RLamm
76e3249754
Changed parameter "slicer" to "partial-coding" to avoid confusion.
2020-01-31 14:22:32 +02:00
RLamm
30d5df40c5
Custom headers for the distributed coding
2020-01-29 15:54:49 +02:00
Joose Sainio
54571529a4
Fix accessing previous frame that didn't exist
2020-01-17 10:48:35 +02:00
Joose Sainio
5c671d20e1
Use the new clipping only in situations where it actually helps
2020-01-17 09:08:21 +02:00
Joose Sainio
3c34d7c863
Fix qp estimation and checking of previous frames that dont exist
2020-01-15 09:32:04 +02:00
Joose Sainio
1a35c22a52
Change clipping of lambda and qp for ctus on OBA rc
...
instead of clipping qp and lambda to the value of last value from the state
clip to previous frame with same layer and if such frame doesn't exist, clip
to previous frame
2020-01-14 14:46:05 +02:00
Pauli Oikkonen
c3d9e97e9f
Fix VS build
2019-12-12 18:34:55 +02:00
Pauli Oikkonen
7f238ca299
Remove debug print functions
...
Whoops
2019-12-12 18:19:31 +02:00
Pauli Oikkonen
eefb5e50b3
De-inline pred_filtered_dc functions, shouldn't make much difference though
2019-12-12 17:30:00 +02:00
Pauli Oikkonen
169314de4f
32x32 filtered DC prediction in AVX2
2019-12-11 18:17:06 +02:00
Pauli Oikkonen
fb2481b7e4
16x16 filtered DC implemented in AVX2
2019-12-10 15:54:50 +02:00
Joose Sainio
b78aa7b272
save c and k to frame
2019-12-06 10:52:54 +02:00
Joose Sainio
5b10e5fb7e
parameterize the clipping option
2019-12-06 09:51:04 +02:00
Pauli Oikkonen
da370ea36d
Implement AVX2 8x8 filtered DC algorithm
2019-11-28 14:10:10 +02:00
Pauli Oikkonen
5d9b7019ca
Implement a 4x4 filtered DC pred function
2019-11-26 17:05:54 +02:00
Joose Sainio
ca0060cbba
try the original clipping
2019-11-26 15:13:04 +02:00
Pauli Oikkonen
f1485ab087
Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?
2019-11-25 15:20:29 +02:00
Joose Sainio
ab2fded8af
Update threadwrapper to enable pthread_rwlock_t
2019-11-21 13:38:40 +02:00
Joose Sainio
eb78aead1f
Fix additional potential data races
2019-11-21 11:03:12 +02:00
Joose Sainio
35d7e0d88b
Fix data race
2019-11-21 10:25:04 +02:00
Pauli Oikkonen
979d66031c
Create a strategy out of intra_pred_filtered_dc
2019-11-19 14:50:31 +02:00
Joose Sainio
0e8815a3d8
test clipping qp to previous frame instead of previous ctus
2019-11-19 14:32:31 +02:00
Joose Sainio
ddb4e5a131
move the intra bit calculation so that it is used also with lambda rc
2019-11-19 14:16:48 +02:00
Joose Sainio
a07833f3e6
check that mallocs in rc initialization were successful
...
only call kvz_update_after_picture when using the OBA rc
2019-11-19 13:59:44 +02:00
Joose Sainio
50d410a316
re-enable static qp encoding and lambda rc
2019-11-19 13:45:58 +02:00
Pauli Oikkonen
fa4bb86406
Optimize intra_pred_planar_avx2 for 4x4 blocks
2019-11-19 13:39:02 +02:00
Joose Sainio
57e5615ece
Fix incorrect intra rc calculation skipping
2019-11-19 13:25:31 +02:00
Joose Sainio
6cc3bcd87e
Command line parameters for oba rc and implementation of the usage of the intra parameter
2019-11-19 09:29:06 +02:00
Joose Sainio
eb73548af5
Encode first frame completely before starting others to enable owf
2019-11-18 09:51:37 +02:00
Pauli Oikkonen
4761d228f9
Start to vectorize the 4x4 loop
2019-11-15 17:32:40 +02:00
Pauli Oikkonen
8d45ab4951
Stupidify the 4x4 planar loop for vectorization
2019-11-14 17:14:04 +02:00
Joose Sainio
c759c138ed
Prepare the rc data structure to be shared among all frame encoders
2019-11-13 11:56:25 +02:00
Joose Sainio
cdb7c851a4
Fix weight calculation
2019-11-13 08:55:31 +02:00
Joose Sainio
b9b01f8036
WPP with threading
2019-11-12 12:12:57 +02:00
Joose Sainio
615973adca
should enable threading with wpp when owf is not used
2019-11-12 09:03:00 +02:00
Pauli Oikkonen
6f13f6525c
Merge branch 'new_prints'
2019-11-07 17:04:21 +02:00
Joose Sainio
d353f7dd1a
Disable debug prints, fix multiple bugs in the calculation
2019-11-07 15:08:57 +02:00
mercat
57e8c3ebc2
Merge branch 'ML-cplx_red_ICIP'
2019-11-07 13:25:47 +02:00
Pauli Oikkonen
558f0ec401
Mbps, not mbps
2019-11-05 18:06:00 +02:00
Pauli Oikkonen
2edf533925
Tidy the end report printing
...
Also fix a bug with non-integer target FPS
2019-11-05 17:20:00 +02:00
Joose Sainio
408fd4ccb6
Fix lambda and qp calcualtion for intra frames
...
also fixes a bug with selecting the clip neighbor lambda and clip neighbor qp
selection for inter frames
2019-11-05 10:51:39 +02:00
Pauli Oikkonen
c7313ce567
Store AVG QP information in encmain
2019-11-04 17:08:07 +02:00
Reima Hyvönen
80575c59bf
Some updates done to get right bitrate and avg QP
2019-10-31 15:56:24 +02:00
Reima Hyvönen
252bab8820
Added prints to bitrate and AVG QP
2019-10-31 15:56:24 +02:00
Pauli Oikkonen
6d7a4f555c
Also remove 16x16 (A * B^T)^T matrix multiply
...
Can be done using (B * A^T) instead, it's the exact same
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
2c2deb2366
Tidy AVX2 32x32 matrix multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
98ad78b333
Tidy the old AVX2 32x32 matrix multiply
...
It was actually a very good algorithm, just looked messy!
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
4a921cbdb5
Retain data as much in YMM registers as possible
...
This seems to make it a whole lot quicker
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ac4d710e23
Unroll 32x32 matrix multiply, use all regs
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
a58608d0b8
Remove totally unnecessary (A * B^T)^T 32x32 multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
043f53539f
Implement a streamlined matrix-multiply 32x32 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e9da2d851b
Tidy 32x32 fast DCT's helper functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e382339182
Implement fast (butterfly) 32x32 DCT in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
b5962dadac
Tidy indentation in AVX2 16x16 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
36a8f89025
Fine-tune 16x16 AVX2 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ca9409de2b
Implement 16x16 DCT as butterfly algorithm in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7c69a26717
Use aligned loads and stores for AVX2 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e9c65dca6
Align DCT matrices and temp transform buffers
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
148a150522
Align DCT source and dest blocks to cache line
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e60bbf6a6
Slightly tune 16x16 forward DCT
...
Use an array of __m256i's to store temporary value, essentially letting
the compiler enforce alignment and use aligned loads and stores.
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
c0cc0e8a75
Optimize 16x16 multiply by only slicing right mat once
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e463d27f22
Implement streamlined generic 16x16 matrix multiply
...
It can't be this fast for real, can it?
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
beb85ce9d6
Reorder parameters for 8x8 matrix multiplies
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
292af62256
Implement tailored 16x16 forward DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
30ce461d98
Redo 4x4 matrix multiplication
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
07970ea82f
Streamline by-the-book 8x8 matrix multiplication
...
Also chop up the forward transform into two tailored multiply functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7ec7ab3361
Implement a tailored AVX2 8x8 DCT
2019-10-28 16:19:42 +02:00
Joose Sainio
372934c7db
Fix division by zero
2019-10-10 16:35:56 +03:00
Joose Sainio
9bdfdeaf5c
Rest of the owl
2019-10-09 15:48:58 +03:00
Joose Sainio
1ba8525faf
WIP
2019-10-09 10:35:07 +03:00
Joose Sainio
19496d2692
?
2019-10-03 14:50:11 +03:00
Joose Sainio
4b111e339e
fix couple of bugs in the implementation, bit calculation seems still bit off
2019-10-01 15:08:39 +03:00
Joose Sainio
84615e406a
fix compiler warnings
2019-09-27 14:20:08 +03:00
Joose Sainio
14b7a75713
Call the new functions and fix bugs
2019-09-27 14:14:24 +03:00
Joose Sainio
ef74bfb182
unify naming
2019-09-27 10:16:21 +03:00
Joose Sainio
e36f481bda
qp calculation for frame
2019-09-27 09:05:40 +03:00
Joose Sainio
47019ca1cd
intra ck update
2019-09-26 16:04:53 +03:00
Joose Sainio
7c8f4da7cb
Update c and k except after first intra
2019-09-26 13:09:28 +03:00
Joose Sainio
0577d481c1
CTU level code
2019-09-25 12:12:21 +03:00
pkubaj
1d7fcf4227
Fix build on powerpc64 with LLVM
2019-09-12 15:05:00 +02:00
mercat
0de567bfa4
Fixe memory leak
2019-09-12 09:45:32 +03:00
mercat
fa116de619
Add static
2019-09-11 16:18:12 +03:00
mercat
b8753a9293
Fucking INLINE fixed
2019-09-11 16:12:07 +03:00
mercat
b855144e68
INLINE fixe
2019-09-11 16:12:07 +03:00
mercat
694337b803
Add const and more const
2019-09-11 16:12:07 +03:00
mercat
21c07638ed
Remove const into kvz_init_constraint.
2019-09-11 16:12:06 +03:00
mercat
2bca507abe
Clean version of machine learning constraint code. (ICIP paper)
2019-09-11 16:12:06 +03:00
Alexandre Mercat
0f4b7be6ee
First version of ML ICIP code for master
2019-09-11 16:12:06 +03:00
Pauli Oikkonen
99597b828a
Work around the ancient Win32 calling convention hassle
...
See if this'll work now
2019-09-06 13:14:42 +03:00
Pauli Oikkonen
c5ca18950c
Revert "Revert to 6924d90052
due to broken visual studio build"
...
This reverts commit 1dd0619bd7
.
2019-09-05 18:21:55 +03:00
Pauli Oikkonen
55529decd5
Implement _mm256_insert_epi32 and extract pseudo-ops
...
Visual Studio headers apparently lack these guys
2019-09-05 18:20:52 +03:00
Ari Lemmetti
147378e1f9
Prevent 8x4 and 4x8 bipred in merge analysis
2019-09-03 16:32:50 +03:00
Ari Lemmetti
ef1fdbf259
Separate prediction of single PU/PB from CU/CB
2019-09-03 16:32:50 +03:00
Joose Sainio
7d2737bdf6
WIP picture lambda calculation
2019-09-03 11:03:35 +03:00
Ari Lemmetti
3bc510712f
Enable merge analysis for smp and amp
2019-09-02 17:31:51 +03:00
Ari Lemmetti
557bcbc6aa
Make luma or chroma only inter "recon" or predict possible
2019-09-02 17:15:28 +03:00
Joose Sainio
131c04f65c
Fix incorrect weight for intra frame
2019-08-29 12:01:13 +03:00
Joose Sainio
8f96678d13
Fix issue with intra frames being part of gop when they shouldn't
2019-08-29 09:28:10 +03:00
Ari Lemmetti
aa8ab195d1
Compare rough cost of the best merge mode against AMVP to make mode decision
2019-08-26 22:49:09 +03:00
Ari Lemmetti
8f866ff83a
Use correct index
2019-08-26 20:10:10 +03:00
Ari Lemmetti
2343958a14
Fix transform split for small luma blocks
2019-08-24 21:50:17 +03:00
Ari Lemmetti
800fc8644d
Reset CBFs because CBFs might have been set earlier for depth earlier.
2019-08-24 21:49:33 +03:00
Ari Lemmetti
a80de22bc7
Add only different candidates to the list
2019-08-24 21:49:33 +03:00
Ari Lemmetti
45c7961412
Remove tr depth fill. It should not be needed.
2019-08-24 21:49:32 +03:00
Ari Lemmetti
ff8711aaab
Add missing logic to add valid indices to list
2019-08-24 21:49:29 +03:00
Ari Lemmetti
1dd0619bd7
Revert to 6924d90052
due to broken visual studio build
2019-08-08 15:15:34 +03:00
Pauli Oikkonen
2852baa673
Separate sign3_diff_epu8 from calc_eo_cat
...
Just to keep things simple, clear and obvious
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
17947b79ee
Add sao_shared_generics.h in Makefile.am
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a8dd6ce351
Add a note about having implemented a separate AVX2 version of SAO offset array calculation
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a858e7dd4b
Combine duplicate code into inline functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
de0e97f711
Take 8/16/24b loads and stores into separate functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
10979f58fe
Tidy up code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
9cc11976c0
Combine the delta accumulation from edge and band ddistortion into shared func
...
This won't reduce object size, but there'll be less duplicate code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
55d877bd66
Vectorize sao_edge_ddistortion
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
aef0f301d3
Fix function signatures
...
Mark anything intended as read-only to be const, and fix alignment
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
997fd369b3
Redo calc_sao_edge_dir_avx2
...
Do it wider, 32 pixels at once!
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
db1e475e02
Use i32 instead of i8 for x/y offsets
...
Doesn't matter too much, because this number isn't used in SIMD
computation, only as a memory reference offset.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
12de466ef5
Reimplement non-band SAO color reconstruction in AVX2
...
Streamline things to work on 32 pixels at once instead of 8
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
e8bff99329
Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction
...
Vectorize it all, hope this helps with perf
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
7b5dffa855
Implement calc_sao_offset_array in AVX2
...
To be efficient, the AVX2 color reconstruction algorithm will need
offsets in byte, not dword, arrays. This is completely specific to 8-bit
pixels and the function signature is fundamentally distinct from the
generic algorithm, so it's better to not strategize SAO offset array
calculation.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
29563b7039
Make kvz_calc_sao_offset_array more obvious
...
Name temporary values from array lookups etc that are referred multiple
times to, to make the behavior of the mechanism more transparent. Define
all the constant values at the beginning of the function and declare as
const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
08881f5e9b
(TEMP) (TODO) (whatever) Avoid compiler warnings
...
I want the CI to not crash on its -Wall -Werror, but instead to actually
build the thing and report me about actual memory errors etc
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
c18adc5ee0
Redo sao_band_ddistortion_avx2
...
Avoid branching and do the entire thing on 32 pixels at once in YMMs.
Also make the sao_bands function parameter const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
2827c3e3ab
Make calc_sao_bands less opaque
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
1bb9a079a8
Fix indentation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7bc959c7c5
3 sao functions are now working
2019-08-07 16:35:24 +03:00
Reima Hyvönen
0e0f2d3490
made to clear sum vector after it has been set to memory
2019-08-07 16:35:24 +03:00
Reima Hyvönen
f146de7acb
removed some variables to prevent memory losses
2019-08-07 16:35:24 +03:00
Reima Hyvönen
247c3a7a71
conversed gined to unsigned int
2019-08-07 16:35:24 +03:00
Reima Hyvönen
ac5c216974
Some more memory error preventing to sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3fb1cbca35
more editing sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
afbb6fb960
some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3496a57f7a
Edited sao_edge_ddistortion_avx2 to avoid memory overflow
2019-08-07 16:35:24 +03:00
Reima Hyvönen
267ba1d6ce
Modified sao_band_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
e70663b245
added some sub commands to avoid memory read errors
2019-08-07 16:35:24 +03:00
Reima Hyvönen
59dfb4570c
Converted some loads to load int8_t instead ints
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8b253209a8
Found false address load from calc_sao_edge_dir. Should now work like generic
2019-08-07 16:35:24 +03:00
Reima Hyvönen
50e0a47b7a
Took away __restrict
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8a39eb674e
Removed c-variable from calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bc0a36830d
Clerified some 6 pixel loads
2019-08-07 16:35:24 +03:00
Reima Hyvönen
1a8b211e05
Added break to line 170
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d05e750ebe
Added some switches to prevent segmentation fault from reading
2019-08-07 16:35:24 +03:00
Reima Hyvönen
203580047d
Defined some AVX functions
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c884c738b1
Updated some commands to match the standard
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b412ed2f59
Removed some setr and used loads calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c6cc063534
converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract
2019-08-07 16:35:24 +03:00
Reima Hyvönen
47ac109b10
optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND
2019-08-07 16:35:24 +03:00
Reima Hyvönen
96dc60a1ed
first working optimation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c148aff9fb
Some optimation done to function sao_reconstruct_color_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bf16ba6cc4
Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
79dc39a676
Some editing for sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
06ee52924e
some reconst done to calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
5fbc65d823
reconst optimation doesn't work yet
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d29f834a69
Remove useless function
2019-08-07 16:35:24 +03:00
Reima Hyvönen
a232a12160
calc_sao_edge_dir_avx2 updated
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b1febc02a5
sao_edge_ddistortion_avx2 now working proberly
2019-08-07 16:35:24 +03:00
Reima Hyvönen
cd6092a1ec
Still too much bits, looking for where they appear
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7853be8eeb
Incomple optimation
2019-08-07 16:35:24 +03:00
Ari Lemmetti
40609aa865
Add missing headers to Makefile.am
2019-07-12 19:15:51 +03:00
Ari Lemmetti
5db3a78499
Bump versions for release 1.3
2019-07-09 22:09:32 +03:00
Ari Lemmetti
d513ab1999
Add missing newline
2019-07-09 21:06:05 +03:00
Ari Lemmetti
4967072625
Do not bypass search on skip cu if early_skip is not enabled
2019-07-09 20:20:12 +03:00
Ari Lemmetti
b20992a9f3
Rename functions more descriptive
2019-07-09 20:20:11 +03:00
Ari Lemmetti
a348a0ec23
Fix transform depth in early skip
2019-07-09 20:05:48 +03:00
Pauli Oikkonen
8d48bee180
Tidy fast coeff cost code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
201a43b08e
Clean up the RD-estimation code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
b111df5073
Create preliminary version of improved cost estimator
2019-07-09 18:01:54 +03:00
Ari Lemmetti
be08a87d94
Add missing parameter max-merge to the help message
2019-07-09 16:28:46 +03:00
Ari Lemmetti
d0bb9b4a6d
Add parameter max-merge to presets
2019-07-09 16:26:03 +03:00
Ari Lemmetti
4097331fd6
Early skip
2019-07-09 15:59:31 +03:00
Joose Sainio
977e885ea2
Fix issue with gop=0 introduced in 1c36f68d0c
2019-07-05 12:57:27 +03:00
Mikko Pitkänen
a7f09c8114
Merge branch 'threadwrapper'
2019-06-24 16:54:59 +03:00
Mikko Pitkänen
3dd606ce2e
Add new threadwrapper
2019-06-18 18:45:45 +03:00
Joose Sainio
c94077d15e
remove hardcoded value
2019-06-12 14:37:41 +03:00
Joose Sainio
ac68c8444d
remove negation that wasn't supposed to be there
2019-06-12 14:35:24 +03:00
Joose Sainio
5851dcc3be
missing negation
2019-06-12 14:08:18 +03:00
Joose Sainio
1c36f68d0c
Fix owf>=9 gop=8 and add test to catch such problem in future
2019-06-12 14:04:41 +03:00
Ari Lemmetti
933ff6ed55
Merge branch 'set-qp-in-cu-fix'
2019-06-07 09:01:03 +03:00
Ari Lemmetti
c6da839002
Set lcu sqrt lambda according to lcu lambda instead of frame lambda when ROI is used
2019-05-29 18:32:10 +03:00
Ari Lemmetti
9339845e8b
Set QP completely at CU level as the name '--set-qp-in-cu' implies
...
-Move slice delta QP to CU level when using --set-qp-in-cu
-Separate functionality from roi
2019-05-24 20:38:39 +03:00
Pauli Oikkonen
081d16fc33
Fix intrinsics that may be missing on some systems
...
Create a header to collect all the workarounds for missing intrinsics
in one place
2019-05-23 19:59:40 +03:00
Pauli Oikkonen
87a9208db8
Eliminate cvtsi64_si128 intrinsic
...
Apparently it'll cause Win32 builds to break because it emits the movq
instruction or something..
2019-04-17 16:30:40 +03:00
Pauli Oikkonen
7175d20bb2
Still include stdint.h for non-vector builds
2019-04-15 19:36:01 +03:00
Pauli Oikkonen
1315c7e2b0
Do not compile any vector code for non-SSE4/AVX2 builds
2019-04-15 19:10:48 +03:00
Pauli Oikkonen
f5f70e7bc5
Merge branch 'sad-optimization'
2019-04-15 19:02:01 +03:00
Jan Beich
85f46e17a9
Detect AltiVec via elf_aux_info() on FreeBSD 12+
2019-04-01 13:08:04 +00:00
Jan Beich
82486255da
Simplify AltiVec detection on Linux
2019-04-01 13:08:04 +00:00
Pauli Oikkonen
6d43759604
Create a border-respecting 32-wide AVX hor_sad
2019-03-07 18:01:22 +02:00
Pauli Oikkonen
f218cecb38
Remove offending hor_sad_avx2_w32 function
...
Consider possibly creating a non-offending AVX2 version instead, the
way hor_sad_sse41_w32 works. Or maybe there's more essential work to
do.
2019-03-05 22:51:41 +02:00
Pauli Oikkonen
df2e6c54fd
4-unroll hor_sad_sse41_arbitrary
...
This may not increase perf though because it's so rarely used
function, so keeping icache footprint may be more essential...
2019-03-05 22:45:23 +02:00
Pauli Oikkonen
448eacba7b
Avoid overreading block borders in hor_sad_sse41_arbitrary
2019-03-05 22:34:50 +02:00
Eemeli Kallio
c159e275b7
Merge branch 'max_merge'
2019-03-05 14:39:03 +02:00
Pauli Oikkonen
41f51c08c4
Avoid overrunning buffer in hor_sad_sse41_w32
2019-03-01 15:37:38 +02:00
Pauli Oikkonen
bcd9879359
Include quant coeff range check in non-scaling list execution path too
2019-02-27 17:26:44 +02:00
Pauli Oikkonen
24e6363f64
Remove the kvz_quant_avx2 wrapper function
2019-02-27 16:32:58 +02:00
Pauli Oikkonen
748820f3c5
Eliminate unnecessary loading of coeffs if scaling lists are off
2019-02-27 16:26:35 +02:00
Pauli Oikkonen
5994350f40
Allow quant_flat_avx2 to be used with scaling lists on
2019-02-27 16:25:59 +02:00
Eemeli Kallio
7f4e0acf41
Added check if max-merge is out of bounds
2019-02-19 13:53:42 +02:00
Pauli Oikkonen
9b0e079262
Use SSE instructions for 64-bit SADs instead of MMX
...
VC++ seems to choke on MMX instructions
2019-02-18 20:13:33 +02:00
Pauli Oikkonen
d8b8923028
Add LGPL notices to reg_sad headers
2019-02-18 17:52:47 +02:00
Eemeli Kallio
2a40560888
some variables to const
2019-02-12 11:24:10 +02:00
Eemeli Kallio
8f8e7bb53c
Added possibility to reduce number of maximum number of merge candidates.
2019-02-12 09:21:03 +02:00
Pauli Oikkonen
770db825b9
Create hor_sad_w8 and w4 epol mask the way w16 works
2019-02-06 19:34:26 +02:00
Pauli Oikkonen
aa19bcac8a
Avoid branching in creating shuffle mask in hor_sad_w16
2019-02-06 18:58:46 +02:00
Pauli Oikkonen
2d05ca8520
Remove width from constant-width hor_sad func params
...
They should kinda know it already
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
57db234d95
Move 32-wide SSE4.1 hor_sad to picture-sse41.c
...
It's not used by picture-avx2.c that also includes the header, so
it should not be in the header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
dd7d989a39
Implement 32-wide hor_sad on AVX2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ff70c8a5ec
Utilize horizontal SAD functions for SSE4.1 as well
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
f5ff4db01f
4-wide hor_sad border agnostic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
35e7f9a700
Fix hor_sad w8 to work with both borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
836783dd6e
Use hor_sad_w32 for both left and right borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
69687c8d24
Modify hor_sad_sse41_w16 to work over left and right borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
51c2abe99a
Modify image_interpolated_sad to use kvz_hor_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
1e0eb1af30
Add generic strategy for hor_sad'ing an non-split width block
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
686fb2c957
Unroll arbitrary-width SSE4.1 hor_sad by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
768203a2de
First version of arbitrary-width SSE4.1 hor_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ccf683b9b6
Start work on left and right border aware hor_sad
...
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
760bd0397d
Pad the image buffer by 64 bytes from both ends
...
This will be necessary for an efficient and straightforward
implementation of hor_sad for blocks over 16 pixels wide, because they
cannot use the shuffle trick because inter-lane shuffling is so hard to
do
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
c36482a11a
Fix bug in 24-wide SAD
...
*facepalm*
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
f781dc31f0
Create strategy for ver_sad
...
Easy to vectorize
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ca94ae9529
Handle extrapolated blocks with unmodified width using optimized_sad pointer
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91b30c7064
Tidy up kvz_image_calc_sad
2019-02-04 20:41:40 +02:00