Commit graph

3387 commits

Author SHA1 Message Date
Pauli Oikkonen 24e6363f64 Remove the kvz_quant_avx2 wrapper function 2019-02-27 16:32:58 +02:00
Pauli Oikkonen 748820f3c5 Eliminate unnecessary loading of coeffs if scaling lists are off 2019-02-27 16:26:35 +02:00
Pauli Oikkonen 5994350f40 Allow quant_flat_avx2 to be used with scaling lists on 2019-02-27 16:25:59 +02:00
Eemeli Kallio 7f4e0acf41 Added check if max-merge is out of bounds 2019-02-19 13:53:42 +02:00
Pauli Oikkonen 1c81c4f779 Add reg_sad headers to VC project 2019-02-18 20:23:31 +02:00
Pauli Oikkonen 9b0e079262 Use SSE instructions for 64-bit SADs instead of MMX
VC++ seems to choke on MMX instructions
2019-02-18 20:13:33 +02:00
Pauli Oikkonen d8b8923028 Add LGPL notices to reg_sad headers 2019-02-18 17:52:47 +02:00
Eemeli Kallio 2a40560888 some variables to const 2019-02-12 11:24:10 +02:00
Eemeli Kallio 8f8e7bb53c Added possibility to reduce number of maximum number of merge candidates. 2019-02-12 09:21:03 +02:00
Marko Viitanen 1165219842 Update PTL, SPS ext and SPS flags to match VTM 4rc1 2019-02-07 10:00:04 +02:00
Pauli Oikkonen 770db825b9 Create hor_sad_w8 and w4 epol mask the way w16 works 2019-02-06 19:34:26 +02:00
Pauli Oikkonen aa19bcac8a Avoid branching in creating shuffle mask in hor_sad_w16 2019-02-06 18:58:46 +02:00
Pauli Oikkonen 2d05ca8520 Remove width from constant-width hor_sad func params
They should kinda know it already
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 57db234d95 Move 32-wide SSE4.1 hor_sad to picture-sse41.c
It's not used by picture-avx2.c that also includes the header, so
it should not be in the header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen dd7d989a39 Implement 32-wide hor_sad on AVX2 2019-02-04 20:41:40 +02:00
Pauli Oikkonen ff70c8a5ec Utilize horizontal SAD functions for SSE4.1 as well 2019-02-04 20:41:40 +02:00
Pauli Oikkonen f5ff4db01f 4-wide hor_sad border agnostic 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 35e7f9a700 Fix hor_sad w8 to work with both borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 836783dd6e Use hor_sad_w32 for both left and right borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 69687c8d24 Modify hor_sad_sse41_w16 to work over left and right borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 51c2abe99a Modify image_interpolated_sad to use kvz_hor_sad 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 1e0eb1af30 Add generic strategy for hor_sad'ing an non-split width block 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 686fb2c957 Unroll arbitrary-width SSE4.1 hor_sad by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 768203a2de First version of arbitrary-width SSE4.1 hor_sad 2019-02-04 20:41:40 +02:00
Pauli Oikkonen ccf683b9b6 Start work on left and right border aware hor_sad
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 760bd0397d Pad the image buffer by 64 bytes from both ends
This will be necessary for an efficient and straightforward
implementation of hor_sad for blocks over 16 pixels wide, because they
cannot use the shuffle trick because inter-lane shuffling is so hard to
do
2019-02-04 20:41:40 +02:00
Pauli Oikkonen c36482a11a Fix bug in 24-wide SAD
*facepalm*
2019-02-04 20:41:40 +02:00
Pauli Oikkonen f781dc31f0 Create strategy for ver_sad
Easy to vectorize
2019-02-04 20:41:40 +02:00
Pauli Oikkonen ca94ae9529 Handle extrapolated blocks with unmodified width using optimized_sad pointer 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 91b30c7064 Tidy up kvz_image_calc_sad 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 9db0a1bcda Create get_optimized_sad func for SSE4.1 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 007fb7ae19 Un-break tests
Pass the NULL optimized function pointer from the test function, it
should still forward execution to width-specific SAD implementations
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 91380729b1 Add generic get_optimized_sad implementation
NOTE: To force generic SAD implementation on devices supporting
vectorized variants, you now have to override both get_optimized_sad
and reg_sad to generic (only overriding get_optimized_sad on AVX2
hardware would just run all SAD blocks through reg_sad_avx2). Let's
see if there's a more sensible way to do it, but it's not trivial.
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 45f36645a6 Move choosing of tailored SAD function higher up the calling chain 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 91cb0fbd45 Create strategy for directly obtaining pointer to constant-width SAD function 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 94035be342 Unify unrolling naming conventions 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 517a4338f6 Unroll SSE SAD for 8-wide blocks to process 4 lines at once 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 0f665b28f6 Unroll arbitrary width SSE4.1 SAD by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen cbca3347b5 Unroll 64-wide AVX2 SAD by 2 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 84cf771dea Unroll 32 and 16 wide SAD vector implementations by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 5df5c5f8a4 Cast all pointers to const types in vector SAD funcs
Also tidy up the pointer arithmetic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen a711ce3df5 Inline fixed width vectorized SAD functions 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 6504145cce Remove 16-pixel wide AVX2 SAD implementation
At least on Skylake, it's noticeably slower than the very simple
version using SSE4.1
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 4cb371184b Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 796568d9cc Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 4d45d828fa Use constant-width SSE4.1 SAD funcs for AVX2 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 2eaa7bc9d2 Move SSE4.1 SAD functions to separate header 2019-02-04 20:41:40 +02:00
Pauli Oikkonen d2db0086e1 Create constant width SAD versions for 8 and 16 pixels 2019-02-04 20:41:40 +02:00
Pauli Oikkonen a13fc51003 Include a blank AVX2 strategy registration function even in non-AVX2 builds 2019-02-04 19:52:24 +02:00
Pauli Oikkonen d55414db66 Only build AVX2 coeff encoding when supported
..whoops
2019-02-04 19:34:30 +02:00