Joose Sainio
47019ca1cd
intra ck update
2019-09-26 16:04:53 +03:00
Joose Sainio
7c8f4da7cb
Update c and k except after first intra
2019-09-26 13:09:28 +03:00
Joose Sainio
0577d481c1
CTU level code
2019-09-25 12:12:21 +03:00
pkubaj
1d7fcf4227
Fix build on powerpc64 with LLVM
2019-09-12 15:05:00 +02:00
mercat
0de567bfa4
Fixe memory leak
2019-09-12 09:45:32 +03:00
mercat
fa116de619
Add static
2019-09-11 16:18:12 +03:00
mercat
b8753a9293
Fucking INLINE fixed
2019-09-11 16:12:07 +03:00
mercat
b855144e68
INLINE fixe
2019-09-11 16:12:07 +03:00
mercat
694337b803
Add const and more const
2019-09-11 16:12:07 +03:00
mercat
21c07638ed
Remove const into kvz_init_constraint.
2019-09-11 16:12:06 +03:00
mercat
2bca507abe
Clean version of machine learning constraint code. (ICIP paper)
2019-09-11 16:12:06 +03:00
Alexandre Mercat
0f4b7be6ee
First version of ML ICIP code for master
2019-09-11 16:12:06 +03:00
Pauli Oikkonen
99597b828a
Work around the ancient Win32 calling convention hassle
...
See if this'll work now
2019-09-06 13:14:42 +03:00
Pauli Oikkonen
c5ca18950c
Revert "Revert to 6924d90052
due to broken visual studio build"
...
This reverts commit 1dd0619bd7
.
2019-09-05 18:21:55 +03:00
Pauli Oikkonen
55529decd5
Implement _mm256_insert_epi32 and extract pseudo-ops
...
Visual Studio headers apparently lack these guys
2019-09-05 18:20:52 +03:00
Ari Lemmetti
147378e1f9
Prevent 8x4 and 4x8 bipred in merge analysis
2019-09-03 16:32:50 +03:00
Ari Lemmetti
ef1fdbf259
Separate prediction of single PU/PB from CU/CB
2019-09-03 16:32:50 +03:00
Joose Sainio
7d2737bdf6
WIP picture lambda calculation
2019-09-03 11:03:35 +03:00
Ari Lemmetti
3bc510712f
Enable merge analysis for smp and amp
2019-09-02 17:31:51 +03:00
Ari Lemmetti
557bcbc6aa
Make luma or chroma only inter "recon" or predict possible
2019-09-02 17:15:28 +03:00
Joose Sainio
131c04f65c
Fix incorrect weight for intra frame
2019-08-29 12:01:13 +03:00
Joose Sainio
8f96678d13
Fix issue with intra frames being part of gop when they shouldn't
2019-08-29 09:28:10 +03:00
Ari Lemmetti
aa8ab195d1
Compare rough cost of the best merge mode against AMVP to make mode decision
2019-08-26 22:49:09 +03:00
Ari Lemmetti
8f866ff83a
Use correct index
2019-08-26 20:10:10 +03:00
Ari Lemmetti
2343958a14
Fix transform split for small luma blocks
2019-08-24 21:50:17 +03:00
Ari Lemmetti
800fc8644d
Reset CBFs because CBFs might have been set earlier for depth earlier.
2019-08-24 21:49:33 +03:00
Ari Lemmetti
a80de22bc7
Add only different candidates to the list
2019-08-24 21:49:33 +03:00
Ari Lemmetti
45c7961412
Remove tr depth fill. It should not be needed.
2019-08-24 21:49:32 +03:00
Ari Lemmetti
ff8711aaab
Add missing logic to add valid indices to list
2019-08-24 21:49:29 +03:00
Ari Lemmetti
1dd0619bd7
Revert to 6924d90052
due to broken visual studio build
2019-08-08 15:15:34 +03:00
Pauli Oikkonen
2852baa673
Separate sign3_diff_epu8 from calc_eo_cat
...
Just to keep things simple, clear and obvious
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
17947b79ee
Add sao_shared_generics.h in Makefile.am
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a8dd6ce351
Add a note about having implemented a separate AVX2 version of SAO offset array calculation
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a858e7dd4b
Combine duplicate code into inline functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
de0e97f711
Take 8/16/24b loads and stores into separate functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
10979f58fe
Tidy up code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
9cc11976c0
Combine the delta accumulation from edge and band ddistortion into shared func
...
This won't reduce object size, but there'll be less duplicate code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
55d877bd66
Vectorize sao_edge_ddistortion
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
aef0f301d3
Fix function signatures
...
Mark anything intended as read-only to be const, and fix alignment
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
997fd369b3
Redo calc_sao_edge_dir_avx2
...
Do it wider, 32 pixels at once!
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
db1e475e02
Use i32 instead of i8 for x/y offsets
...
Doesn't matter too much, because this number isn't used in SIMD
computation, only as a memory reference offset.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
12de466ef5
Reimplement non-band SAO color reconstruction in AVX2
...
Streamline things to work on 32 pixels at once instead of 8
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
e8bff99329
Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction
...
Vectorize it all, hope this helps with perf
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
7b5dffa855
Implement calc_sao_offset_array in AVX2
...
To be efficient, the AVX2 color reconstruction algorithm will need
offsets in byte, not dword, arrays. This is completely specific to 8-bit
pixels and the function signature is fundamentally distinct from the
generic algorithm, so it's better to not strategize SAO offset array
calculation.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
29563b7039
Make kvz_calc_sao_offset_array more obvious
...
Name temporary values from array lookups etc that are referred multiple
times to, to make the behavior of the mechanism more transparent. Define
all the constant values at the beginning of the function and declare as
const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
08881f5e9b
(TEMP) (TODO) (whatever) Avoid compiler warnings
...
I want the CI to not crash on its -Wall -Werror, but instead to actually
build the thing and report me about actual memory errors etc
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
c18adc5ee0
Redo sao_band_ddistortion_avx2
...
Avoid branching and do the entire thing on 32 pixels at once in YMMs.
Also make the sao_bands function parameter const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
2827c3e3ab
Make calc_sao_bands less opaque
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
1bb9a079a8
Fix indentation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7bc959c7c5
3 sao functions are now working
2019-08-07 16:35:24 +03:00
Reima Hyvönen
0e0f2d3490
made to clear sum vector after it has been set to memory
2019-08-07 16:35:24 +03:00
Reima Hyvönen
f146de7acb
removed some variables to prevent memory losses
2019-08-07 16:35:24 +03:00
Reima Hyvönen
247c3a7a71
conversed gined to unsigned int
2019-08-07 16:35:24 +03:00
Reima Hyvönen
ac5c216974
Some more memory error preventing to sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3fb1cbca35
more editing sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
afbb6fb960
some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3496a57f7a
Edited sao_edge_ddistortion_avx2 to avoid memory overflow
2019-08-07 16:35:24 +03:00
Reima Hyvönen
267ba1d6ce
Modified sao_band_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
e70663b245
added some sub commands to avoid memory read errors
2019-08-07 16:35:24 +03:00
Reima Hyvönen
59dfb4570c
Converted some loads to load int8_t instead ints
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8b253209a8
Found false address load from calc_sao_edge_dir. Should now work like generic
2019-08-07 16:35:24 +03:00
Reima Hyvönen
50e0a47b7a
Took away __restrict
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8a39eb674e
Removed c-variable from calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bc0a36830d
Clerified some 6 pixel loads
2019-08-07 16:35:24 +03:00
Reima Hyvönen
1a8b211e05
Added break to line 170
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d05e750ebe
Added some switches to prevent segmentation fault from reading
2019-08-07 16:35:24 +03:00
Reima Hyvönen
203580047d
Defined some AVX functions
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c884c738b1
Updated some commands to match the standard
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b412ed2f59
Removed some setr and used loads calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c6cc063534
converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract
2019-08-07 16:35:24 +03:00
Reima Hyvönen
47ac109b10
optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND
2019-08-07 16:35:24 +03:00
Reima Hyvönen
96dc60a1ed
first working optimation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c148aff9fb
Some optimation done to function sao_reconstruct_color_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bf16ba6cc4
Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
79dc39a676
Some editing for sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
06ee52924e
some reconst done to calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
5fbc65d823
reconst optimation doesn't work yet
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d29f834a69
Remove useless function
2019-08-07 16:35:24 +03:00
Reima Hyvönen
a232a12160
calc_sao_edge_dir_avx2 updated
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b1febc02a5
sao_edge_ddistortion_avx2 now working proberly
2019-08-07 16:35:24 +03:00
Reima Hyvönen
cd6092a1ec
Still too much bits, looking for where they appear
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7853be8eeb
Incomple optimation
2019-08-07 16:35:24 +03:00
Ari Lemmetti
40609aa865
Add missing headers to Makefile.am
2019-07-12 19:15:51 +03:00
Ari Lemmetti
5db3a78499
Bump versions for release 1.3
2019-07-09 22:09:32 +03:00
Ari Lemmetti
d513ab1999
Add missing newline
2019-07-09 21:06:05 +03:00
Ari Lemmetti
4967072625
Do not bypass search on skip cu if early_skip is not enabled
2019-07-09 20:20:12 +03:00
Ari Lemmetti
b20992a9f3
Rename functions more descriptive
2019-07-09 20:20:11 +03:00
Ari Lemmetti
a348a0ec23
Fix transform depth in early skip
2019-07-09 20:05:48 +03:00
Pauli Oikkonen
8d48bee180
Tidy fast coeff cost code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
201a43b08e
Clean up the RD-estimation code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
b111df5073
Create preliminary version of improved cost estimator
2019-07-09 18:01:54 +03:00
Ari Lemmetti
be08a87d94
Add missing parameter max-merge to the help message
2019-07-09 16:28:46 +03:00
Ari Lemmetti
d0bb9b4a6d
Add parameter max-merge to presets
2019-07-09 16:26:03 +03:00
Ari Lemmetti
4097331fd6
Early skip
2019-07-09 15:59:31 +03:00
Joose Sainio
977e885ea2
Fix issue with gop=0 introduced in 1c36f68d0c
2019-07-05 12:57:27 +03:00
Mikko Pitkänen
a7f09c8114
Merge branch 'threadwrapper'
2019-06-24 16:54:59 +03:00
Mikko Pitkänen
3dd606ce2e
Add new threadwrapper
2019-06-18 18:45:45 +03:00
Joose Sainio
c94077d15e
remove hardcoded value
2019-06-12 14:37:41 +03:00
Joose Sainio
ac68c8444d
remove negation that wasn't supposed to be there
2019-06-12 14:35:24 +03:00
Joose Sainio
5851dcc3be
missing negation
2019-06-12 14:08:18 +03:00
Joose Sainio
1c36f68d0c
Fix owf>=9 gop=8 and add test to catch such problem in future
2019-06-12 14:04:41 +03:00
Ari Lemmetti
933ff6ed55
Merge branch 'set-qp-in-cu-fix'
2019-06-07 09:01:03 +03:00
Ari Lemmetti
c6da839002
Set lcu sqrt lambda according to lcu lambda instead of frame lambda when ROI is used
2019-05-29 18:32:10 +03:00
Ari Lemmetti
9339845e8b
Set QP completely at CU level as the name '--set-qp-in-cu' implies
...
-Move slice delta QP to CU level when using --set-qp-in-cu
-Separate functionality from roi
2019-05-24 20:38:39 +03:00
Pauli Oikkonen
081d16fc33
Fix intrinsics that may be missing on some systems
...
Create a header to collect all the workarounds for missing intrinsics
in one place
2019-05-23 19:59:40 +03:00
Pauli Oikkonen
87a9208db8
Eliminate cvtsi64_si128 intrinsic
...
Apparently it'll cause Win32 builds to break because it emits the movq
instruction or something..
2019-04-17 16:30:40 +03:00
Pauli Oikkonen
7175d20bb2
Still include stdint.h for non-vector builds
2019-04-15 19:36:01 +03:00
Pauli Oikkonen
1315c7e2b0
Do not compile any vector code for non-SSE4/AVX2 builds
2019-04-15 19:10:48 +03:00
Pauli Oikkonen
f5f70e7bc5
Merge branch 'sad-optimization'
2019-04-15 19:02:01 +03:00
Jan Beich
85f46e17a9
Detect AltiVec via elf_aux_info() on FreeBSD 12+
2019-04-01 13:08:04 +00:00
Jan Beich
82486255da
Simplify AltiVec detection on Linux
2019-04-01 13:08:04 +00:00
Pauli Oikkonen
6d43759604
Create a border-respecting 32-wide AVX hor_sad
2019-03-07 18:01:22 +02:00
Pauli Oikkonen
f218cecb38
Remove offending hor_sad_avx2_w32 function
...
Consider possibly creating a non-offending AVX2 version instead, the
way hor_sad_sse41_w32 works. Or maybe there's more essential work to
do.
2019-03-05 22:51:41 +02:00
Pauli Oikkonen
df2e6c54fd
4-unroll hor_sad_sse41_arbitrary
...
This may not increase perf though because it's so rarely used
function, so keeping icache footprint may be more essential...
2019-03-05 22:45:23 +02:00
Pauli Oikkonen
448eacba7b
Avoid overreading block borders in hor_sad_sse41_arbitrary
2019-03-05 22:34:50 +02:00
Eemeli Kallio
c159e275b7
Merge branch 'max_merge'
2019-03-05 14:39:03 +02:00
Pauli Oikkonen
41f51c08c4
Avoid overrunning buffer in hor_sad_sse41_w32
2019-03-01 15:37:38 +02:00
Pauli Oikkonen
bcd9879359
Include quant coeff range check in non-scaling list execution path too
2019-02-27 17:26:44 +02:00
Pauli Oikkonen
24e6363f64
Remove the kvz_quant_avx2 wrapper function
2019-02-27 16:32:58 +02:00
Pauli Oikkonen
748820f3c5
Eliminate unnecessary loading of coeffs if scaling lists are off
2019-02-27 16:26:35 +02:00
Pauli Oikkonen
5994350f40
Allow quant_flat_avx2 to be used with scaling lists on
2019-02-27 16:25:59 +02:00
Eemeli Kallio
7f4e0acf41
Added check if max-merge is out of bounds
2019-02-19 13:53:42 +02:00
Pauli Oikkonen
9b0e079262
Use SSE instructions for 64-bit SADs instead of MMX
...
VC++ seems to choke on MMX instructions
2019-02-18 20:13:33 +02:00
Pauli Oikkonen
d8b8923028
Add LGPL notices to reg_sad headers
2019-02-18 17:52:47 +02:00
Eemeli Kallio
2a40560888
some variables to const
2019-02-12 11:24:10 +02:00
Eemeli Kallio
8f8e7bb53c
Added possibility to reduce number of maximum number of merge candidates.
2019-02-12 09:21:03 +02:00
Pauli Oikkonen
770db825b9
Create hor_sad_w8 and w4 epol mask the way w16 works
2019-02-06 19:34:26 +02:00
Pauli Oikkonen
aa19bcac8a
Avoid branching in creating shuffle mask in hor_sad_w16
2019-02-06 18:58:46 +02:00
Pauli Oikkonen
2d05ca8520
Remove width from constant-width hor_sad func params
...
They should kinda know it already
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
57db234d95
Move 32-wide SSE4.1 hor_sad to picture-sse41.c
...
It's not used by picture-avx2.c that also includes the header, so
it should not be in the header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
dd7d989a39
Implement 32-wide hor_sad on AVX2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ff70c8a5ec
Utilize horizontal SAD functions for SSE4.1 as well
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
f5ff4db01f
4-wide hor_sad border agnostic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
35e7f9a700
Fix hor_sad w8 to work with both borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
836783dd6e
Use hor_sad_w32 for both left and right borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
69687c8d24
Modify hor_sad_sse41_w16 to work over left and right borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
51c2abe99a
Modify image_interpolated_sad to use kvz_hor_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
1e0eb1af30
Add generic strategy for hor_sad'ing an non-split width block
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
686fb2c957
Unroll arbitrary-width SSE4.1 hor_sad by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
768203a2de
First version of arbitrary-width SSE4.1 hor_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ccf683b9b6
Start work on left and right border aware hor_sad
...
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
760bd0397d
Pad the image buffer by 64 bytes from both ends
...
This will be necessary for an efficient and straightforward
implementation of hor_sad for blocks over 16 pixels wide, because they
cannot use the shuffle trick because inter-lane shuffling is so hard to
do
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
c36482a11a
Fix bug in 24-wide SAD
...
*facepalm*
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
f781dc31f0
Create strategy for ver_sad
...
Easy to vectorize
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ca94ae9529
Handle extrapolated blocks with unmodified width using optimized_sad pointer
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91b30c7064
Tidy up kvz_image_calc_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
9db0a1bcda
Create get_optimized_sad func for SSE4.1
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91380729b1
Add generic get_optimized_sad implementation
...
NOTE: To force generic SAD implementation on devices supporting
vectorized variants, you now have to override both get_optimized_sad
and reg_sad to generic (only overriding get_optimized_sad on AVX2
hardware would just run all SAD blocks through reg_sad_avx2). Let's
see if there's a more sensible way to do it, but it's not trivial.
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
45f36645a6
Move choosing of tailored SAD function higher up the calling chain
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91cb0fbd45
Create strategy for directly obtaining pointer to constant-width SAD function
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
94035be342
Unify unrolling naming conventions
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
517a4338f6
Unroll SSE SAD for 8-wide blocks to process 4 lines at once
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
0f665b28f6
Unroll arbitrary width SSE4.1 SAD by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
cbca3347b5
Unroll 64-wide AVX2 SAD by 2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
84cf771dea
Unroll 32 and 16 wide SAD vector implementations by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
5df5c5f8a4
Cast all pointers to const types in vector SAD funcs
...
Also tidy up the pointer arithmetic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
a711ce3df5
Inline fixed width vectorized SAD functions
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
6504145cce
Remove 16-pixel wide AVX2 SAD implementation
...
At least on Skylake, it's noticeably slower than the very simple
version using SSE4.1
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
4cb371184b
Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
796568d9cc
Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
4d45d828fa
Use constant-width SSE4.1 SAD funcs for AVX2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
2eaa7bc9d2
Move SSE4.1 SAD functions to separate header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
d2db0086e1
Create constant width SAD versions for 8 and 16 pixels
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
a13fc51003
Include a blank AVX2 strategy registration function even in non-AVX2 builds
2019-02-04 19:52:24 +02:00
Pauli Oikkonen
d55414db66
Only build AVX2 coeff encoding when supported
...
..whoops
2019-02-04 19:34:30 +02:00
Pauli Oikkonen
3fe2f29456
Merge branch 'encode-coeffs-avx2'
2019-02-04 18:52:31 +02:00
Pauli Oikkonen
722b738888
Fix more naming issues
2019-02-04 16:05:43 +02:00
Pauli Oikkonen
e26d98fb75
Rename a couple variables and add crucial comments
2019-02-04 15:57:07 +02:00
Pauli Oikkonen
f186455619
Move encode_last_significant_xy out of strategy modules
...
It's the exact same in both AVX2 and generic, and does not seem to
be worth even trying to vectorize
2019-02-04 14:55:41 +02:00
Pauli Oikkonen
3f7340c932
Fine-tune pack_16x16b_to_16x2b
...
Avoid mm_set1 operation when it's possible to create the constant with
one bit-shift operation from another instead. Thanks Intel for
3-operand instruction encoding!
2019-02-04 14:44:47 +02:00
Pauli Oikkonen
314f5b0e1f
Rename 16x2b cmpgt function, comment it better, optimize it slightly
...
Eliminate an unnecessary bit masking to make it even more messy
2019-02-04 14:44:32 +02:00
Pauli Oikkonen
d8ff6a6459
Fix _andn_u32 to work on old Visual Studio
2019-02-01 15:34:42 +02:00
Pauli Oikkonen
26e1b2c783
Use (u)int32_t instead of (unsigned) int in reg_sad_sse41
2019-01-10 14:37:04 +02:00
Pauli Oikkonen
3a1f2eb752
Prefer SSE4.1 implementation of SAD over AVX2
...
It seems that the 128-bit wide version consistently outperforms the
256-bit one
2019-01-10 13:48:55 +02:00
Pauli Oikkonen
9b24d81c6a
Use SSE instead of AVX for small widths
...
Highly dubious if this will help performance at all
2019-01-07 20:12:13 +02:00
Pauli Oikkonen
b2176bf72a
Optimize SSE4.1 version of SAD
...
Make it use the same vblend trick as AVX2. Interestingly, on my test
setup this seems to be faster than the same code using 256-bit AVX
vectors.
2019-01-07 19:40:57 +02:00
Pauli Oikkonen
887d7700a8
Modify AVX2 SAD to mask data by byte granularity in AVX registers
...
Avoids using any SAD calculations narrower than 256 bits, and
simplifies the code. Also improves execution speed
2019-01-07 18:53:15 +02:00
Pauli Oikkonen
7585f79a71
AVX2-ize SAD calculation
...
Performance is no better than SSE though
2019-01-07 16:26:24 +02:00
Pauli Oikkonen
ab3dc58df6
Copy SAD SSE4.1 impl to AVX2
2019-01-03 18:31:57 +02:00
Pauli Oikkonen
45ac6e6d03
Tidy pack_16x16b_to_16x2b comments
2019-01-03 16:37:05 +02:00
Ari Lemmetti
cd818db724
Add missing quantization and residual in cost calculation (inter rd=2).
2018-12-21 15:55:29 +02:00
Pauli Oikkonen
016eb014ad
Move packing 16x16b -> 16x2b into separate function
2018-12-20 10:51:44 +02:00
Ari Lemmetti
b234897e8a
Fix smp and amp blocks in fme and revert previous change.
...
Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc.
Calculate SATD on the 8x4, ... part
2018-12-19 21:30:53 +02:00
Pauli Oikkonen
9aaa6f260d
Fixes to enable portability
2018-12-18 20:42:09 +02:00
Pauli Oikkonen
2fdbbe9730
Move CG reordering code from quant-avx2 to shared header
2018-12-18 19:42:18 +02:00
Pauli Oikkonen
d02207306d
Create a header file for shared AVX2 code
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
361bf0c7db
Precompute >=2 coeff encoding loop with 2-bit arithmetic
...
Who needs 16x16b vectors when you can do practically the same with
16x2b pseudovectors in 32-bit general purpose registers!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
940b0e9e6a
Require BMI2 for AVX2 build
...
Any processor implementing AVX2 should also implement BMI2
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
f66cb23d5b
Optimize greater1 encoding loop
...
Calculating the c1 variable need not be a serial operation!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
8c8b791c35
Vectorize kvz_context_get_sig_ctx_inc
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
033261eb74
Eliminate two branches using bit magic
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
c4434e8d04
Scan CG's in forward order to simplify finding last significant
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
efd097f5a5
Vectorize the coeff group loop to some extent
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
a01362e638
use the efficient method of reordering raster->scan
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
50a888e789
Use the efficient method to find first and last nz coeffs in block
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
7e9203f566
Scan coeff groups in scan order to help find last significant one
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
9a5a6fdbc7
Simplify two ifs in encode_coeff_nxn-avx2
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
37a2a8bac8
See if loop can be optimized by rearranging
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
584f2f74b6
Vectorize significant coeff group scanning loop
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
1bfed73221
Add AVX2 strategy for encode_coding_tree
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
c3a6f3112a
Add generic strategy group for encode_coding_tree
2018-12-18 19:41:09 +02:00
Marko Viitanen
1ef851ab4b
Disable FME on amp/smp blocks with width or height not divisible by 8
2018-12-18 10:28:21 +02:00
Joose Sainio
b71c5573f0
Merge branch 'rate_control_fix'
2018-12-17 12:39:27 +02:00
Sergei Trofimovich
68a70e45a1
x86 asm: mark stack as non-executable
...
Gentoo's `scanelf` QA tool detects writable/executable stack
of assembly-writtent files as:
```
$ scanelf -qRa .
0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o
0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o
0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-sad.o
0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-satd.o
```
Normally C compiler emits non-executable stack marking (or GNU assembler
via `-Wa,--noexecstack`).
The change adds non-executable stack marking for yasm-based assmbly files.
https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2018-12-16 11:31:56 +00:00
Reima Hyvönen
1fcc5c6a8d
Merge branch 'bipred_recon'
2018-12-11 09:59:35 +02:00
Reima Hyvönen
e4a10880f3
Added case 12 to bipred_recon no mov
2018-12-11 09:52:17 +02:00
Marko Viitanen
a4f3968e52
Fix Visual Studio errors by initializing some variables used in AVX2 signhiding
2018-12-11 09:33:26 +02:00
Ari Lemmetti
ac943147e3
Calculate satd cost for whole non-square blocks as well.
2018-12-10 17:04:29 +02:00
Pauli Oikkonen
c465578048
Add a descriptive comment to coefficient reordering
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
f78bf2ebcb
Optimize q_coefs usage for indexed fetch
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
d9591f1b49
Eliminate midway buffering of reordered coefs
...
TODO: For some mysterious reason seems slightly slower than the
buffered one
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
7fe454c51f
Optimize get_cheapest_alternative()
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
6bbd3e5a44
Optimize rearrange_512 function
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
cb8209d1b3
Vectorize transform coefficient reordering loop
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
7cf4c7ae5f
Rename "reduce" functions to hsum
...
That's what the functions fundamendally do anyway
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
316cd8a846
Fix ALIGNED keyword and grow alignment to 64B
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
1befc69a4c
Implement sign bit hiding in AVX2
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
c5cd03497e
Require BMI and ABM instruction sets for AVX2 build
...
AVX2 support on a processor should always imply BMI and ABM support.
The lzcnt and tzcnt instructions have more suitable semantics in the
corner case that source word is 0, and allow us to even handle that
scenario without a branch. Apparently Visual Studio will already
include this support when building with AVX2 enabled, so only the
automake files need to be tweaked.
2018-12-03 15:36:32 +02:00
Reima Hyvönen
f8696b54a4
Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)
2018-11-20 17:09:19 +02:00
Marko Viitanen
a5a10a33c3
Enable --scaling-list parameter and add to the documentation
2018-11-19 10:47:30 +02:00
Reima Hyvönen
710ba288db
Chroma has some problems
2018-11-15 16:42:48 +02:00
Sami Ahovainio
8f98d4aac7
Added square search
2018-11-14 14:50:31 +02:00
Marko Viitanen
6871490dd5
Simplify get_mvd_coding_cost(), only include golomb coding
2018-11-14 14:33:31 +02:00
Ari Lemmetti
a832206bb6
Replace 32-bit incompatible instrinsics
2018-11-12 18:54:33 +02:00
Ari Lemmetti
5c774c4105
Rewrite most of FME and interpolation filters
...
Changes had to break a lot of stuff and were just squashed into this horrible code dump
2018-11-08 20:21:16 +02:00
Joose Sainio
1c8a1f24e2
Don't assume anything about bits spent
2018-11-07 16:03:38 +02:00
Joose Sainio
3471e2470d
Fix using uninitialized value for the first frame
2018-11-07 08:17:39 +02:00
Joose Sainio
d95ac11a3b
Fix rate_control for other LP-GOPS
2018-11-06 14:20:44 +02:00
Joose Sainio
67a6ba667e
Fix rate control for flat lp-gop
2018-11-06 09:38:17 +02:00
Reima Hyvönen
7406c33a42
Some more cleaning
2018-10-26 12:25:18 +03:00
Reima Hyvönen
4c71546b2e
Cleaned some coding
2018-10-26 12:19:44 +03:00
Reima Hyvönen
4fe3909e48
Switched luma to use 32bits size ints intstead of 16bit size
2018-10-24 18:24:46 +03:00
Eemeli Kallio
284e73839e
Calculating zero cost moved to its own function
2018-10-16 11:02:01 +03:00
Reima Hyvönen
381e786e10
Trying to find the bug in luma
2018-10-11 18:08:41 +03:00
Marko Viitanen
c589e5ed36
Fix closed-gop frame feed, the ordering was incorrect after the first GOP
2018-10-10 11:12:03 +03:00
Reima Hyvönen
2f5f81bac3
removed the non-optimated bipred function
2018-10-09 11:19:23 +03:00
Marko Viitanen
75dce4f3ce
Fix low-delay-gop usage with --no-open-gop
2018-10-04 15:16:02 +03:00
Marko Viitanen
de71b58f76
Change closed GOP structure to include an additional IDR between GOPs
2018-10-04 11:17:03 +03:00
Reima Hyvönen
212a8e68fa
Modified to avoid memory overflow, still some bug inside luma
2018-10-02 20:23:32 +03:00
Marko Viitanen
954f07e3d7
Add --(no-)open-gop option
2018-10-02 10:05:32 +03:00
Marko Viitanen
8bef85e056
Merge branch 'set-qp-in-cu'
2018-09-03 08:33:33 +03:00
Ari Lemmetti
2fdcc2b79d
Add option --set-qp-in-cu
2018-09-03 08:32:45 +03:00
Reima Hyvönen
896034b7cf
Some renamed functions back
2018-08-28 15:31:10 +03:00
Reima Hyvönen
e8b5e6db4c
Did some merging
2018-08-28 15:26:27 +03:00
Reima Hyvönen
7de5c74434
Updated bipred_recon to work faster
2018-08-28 15:12:31 +03:00
Reima Hyvönen
47b357cca2
Comment one test
2018-08-27 18:52:14 +03:00
Reima Hyvönen
2ca99a44e8
Updated shuffle operation to be in right order
2018-08-27 18:16:38 +03:00
Marko Viitanen
b85ae3688e
Signal QP in slice header if tiles and slices=tiles are enabled
...
Keeps the PPS constant for various purposes
2018-08-16 08:44:39 +03:00
Reima Hyvönen
508b218a12
some modifications made to prevent reading too much
2018-08-14 10:50:39 +03:00
Reima Hyvönen
1d935ee888
some useless stuff removed
2018-08-13 16:47:11 +03:00
Reima Hyvönen
ce3ac4c05e
some modifications to no_mov
2018-08-13 16:41:02 +03:00
Reima Hyvönen
15a613ae94
test if no_mov breaks testing
2018-08-13 16:02:56 +03:00
Reima Hyvönen
97a2049e58
removed pointer declaration out from switch
2018-08-10 16:42:26 +03:00
Reima Hyvönen
aa94bcedbc
Stream is now pointer
2018-08-10 16:38:49 +03:00
Reima Hyvönen
fa5b227ece
256 to 32 doesn't work, made them by hand
2018-08-10 16:01:20 +03:00
Reima Hyvönen
408dedbcc8
removed _mm256_extract_epi8 and replaced with _mm_stream
2018-08-10 15:53:26 +03:00
Reima Hyvönen
31c35091c6
_mm256_cvtsi256_si32 removed
2018-08-10 10:06:40 +03:00
Reima Hyvönen
99dc43074f
_mm256_cvtsi256_si32 breaks system, too much bits. back to extract
2018-08-10 09:59:33 +03:00
Reima Hyvönen
4f1f80b2cb
Transformed convert from 256 to cast 256 -> 128 and then convert from 128
2018-08-09 15:35:54 +03:00
Reima Hyvönen
4957555eb3
Removed leftover from 939
2018-08-09 15:25:03 +03:00
Reima Hyvönen
28b165c971
Clearified some sections, added _MM_SHUFFLE macro
2018-08-09 15:23:01 +03:00
Reima Hyvönen
dd04df8667
testing if error in both avx2 functions
2018-08-03 11:49:00 +03:00
Reima Hyvönen
ed50d71fde
Switched some variables to different location, altered inter_recon_bipred_avx2 function
2018-08-02 16:08:59 +03:00
Reima Hyvönen
f5739a0028
Renaming and removing useless prints
2018-08-02 14:47:17 +03:00
Reima Hyvönen
bc09f59bb6
Edited some definitions
2018-08-02 11:54:53 +03:00
Arttu Ylä-Outinen
83555c3d6d
Enable --fast-residual-cost with fastest presets
2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen
c438bb4a19
Add an option to skip CABAC for residual costs
...
Adds command line option --fast-residual-cost=<limit>. When QP is below
the limit, estimates the cost of coding the residual coefficients from
the sum of absolute coefficients. Skipping CABAC is not worth it with
high QPs because there are fewer coefficients so CABAC is not as slow.
2018-07-16 12:31:20 +03:00
Reima Hyvönen
a4bf77f208
Tested some extract functions
2018-07-12 09:29:32 +03:00
Reima Hyvönen
c05033a893
Even more useless vectors removed
2018-07-11 15:09:14 +03:00
Reima Hyvönen
884cb77238
Removed some not used vectors
2018-07-11 15:06:11 +03:00
Reima Hyvönen
792689a5ff
Removed for-loops, added extract instead
2018-07-11 14:56:41 +03:00
Reima Hyvönen
f9c7f6ee66
Added some break-operations for avx2 optimation
2018-07-11 14:15:38 +03:00
Reima Hyvönen
cc064da143
some more optimation for bipred
2018-07-11 11:27:54 +03:00
Reima Hyvönen
9a339eef89
Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD
...
# Conflicts:
# build/kvazaar_lib/kvazaar_lib.vcxproj
2018-07-10 16:21:04 +03:00
Reima Hyvönen
a22cf03ddb
Updated to have no movement function to avx2 strategies
2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen
b7474eb532
Fix SAO buffer sizes
...
Increases sizes of buffers used for SAO reconstruction to avoid stack
buffer overflow in AVX2 SAO reconstruction.
2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen
b37470e80f
Merge pull request #207 from jbeich/maltivec
...
Unbreak build on PowerPC if AltiVec isn't supported
2018-07-04 11:06:41 +03:00
Reima Hyvönen
ea83ae45f0
Toimiva ratkaisu
2018-07-03 11:18:51 +03:00
Jan Beich
4f4bea7496
Check -maltivec is supported before using
...
PowerPC target may lack or have non-standard FPU:
$ cc -dumpmachine
powerpcspe-undermydesk-freebsd
$ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c
src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist
2018-07-02 23:25:23 +00:00
Jan Beich
b892d820f8
Clean up macOS includes on powerpc* after 93e1c9f1c3
...
strategyselector.c:426:25: machine/cpu.h: No such file or directory
2018-07-02 21:52:45 +00:00
Reima Hyvönen
17babfffa4
25.6 working optimation, ~50% faster than original
2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen
2f995f4325
Merge pull request #205 from jbeich/powerpc
...
Unbreak build on non-Linux powerpc*
2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen
c1398ef818
Permit --period=1 with any GOP structure
...
All intra coding is a special case so it can be permitted even though
Kvazaar normally only supports intra periods that are divisible by the
GOP length.
2018-06-18 12:26:11 +03:00
Arttu Ylä-Outinen
abdebe0bf9
Fix --owf help message
...
The number of parallel frames is --owf plus one, not --owf minus one.
Fixes #204 .
2018-06-18 09:33:36 +03:00
Jan Beich
93e1c9f1c3
Add AltiVec detection for BSDs
...
strategyselector.c:377:26: linux/auxvec.h: No such file or directory
2018-06-17 15:38:24 +00:00
Miika Metsoila
98972d26c2
Document that the high tier requires level 4 or higher
2018-06-14 12:41:03 +03:00
Miika Metsoila
62b44efaa4
Write the encoding tier (main/high) into the bitstream
2018-06-14 12:41:03 +03:00
Arttu Ylä-Outinen
a343f6d587
Prepare for delta QPs at CU-level
...
- Replaces lcu_dqp_enabled with max_qp_delta_depth in encoder_control_t.
- Fixes set_cu_qps so that it can handle quantization groups of
arbitrary size.
- Fixes computation of QP predictors so that it works for quantization
groups of arbitrary size.
2018-06-13 15:36:19 +03:00
Arttu Ylä-Outinen
dc6b2024ea
Modify reference count asserts to fix data races
...
Changes asserts on the reference count of objects to assert the value
after KVZ_ATOMIC_INC instead of directly checking the value. Fixes some
data races detected by TSan.
2018-06-12 09:35:07 +03:00
Ari Lemmetti
4fb1c16c61
Add early termination for intra rdo when a zero coefficient block is found.
2018-06-08 21:03:07 +03:00
Ari Lemmetti
492529fb7a
Add the same comment to help message as well...
2018-05-30 14:13:15 +03:00
Ari Lemmetti
0d5972bf03
Add missing sort to intra transform split search so mode at 0 is the best
2018-05-21 13:10:38 +03:00
Sebastien Alaiwan
954bca7d6e
Fix memset parameter
2018-05-17 11:24:49 +02:00
Jaakko Laitinen
f9466efcbb
Close file on error
2018-05-15 11:50:16 +03:00
Reima Hyvönen
9fed29f950
optimation for inter_recon_bipred
2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen
5c585c4fbc
Update help message
...
Updates the default option values to match the medium preset.
2018-04-03 10:40:37 +03:00
Arttu Ylä-Outinen
2b4e22111a
Update presets
...
The new presets are slower but have better coding efficiency.
2018-04-03 10:37:30 +03:00
Arttu Ylä-Outinen
7185519a1b
Update command line help
...
- Adds missing default values.
- Adds help for --crypto and --key.
- Adds help for --rd=3.
- Adds help for --sao options.
- Some changes to help wording.
2018-03-23 14:33:04 +02:00
Arttu Ylä-Outinen
3606860504
Add --no-cpuid option
...
Equivalent to --cpuid=0.
2018-03-23 12:32:27 +02:00
Arttu Ylä-Outinen
fb462b25ef
Fix transform skip for inter
...
The transform skip flag in cu_info_t was stored under the intra
substruct even though transform skip can be used for inter as well. This
caused bitstream errors. Fixed by moving the flag out of the substruct.
2018-03-20 11:01:33 +02:00