Commit graph

26 commits

Author SHA1 Message Date
Pauli Oikkonen df2e6c54fd 4-unroll hor_sad_sse41_arbitrary
This may not increase perf though because it's so rarely used
function, so keeping icache footprint may be more essential...
2019-03-05 22:45:23 +02:00
Pauli Oikkonen 448eacba7b Avoid overreading block borders in hor_sad_sse41_arbitrary 2019-03-05 22:34:50 +02:00
Pauli Oikkonen 9b0e079262 Use SSE instructions for 64-bit SADs instead of MMX
VC++ seems to choke on MMX instructions
2019-02-18 20:13:33 +02:00
Pauli Oikkonen d8b8923028 Add LGPL notices to reg_sad headers 2019-02-18 17:52:47 +02:00
Pauli Oikkonen 770db825b9 Create hor_sad_w8 and w4 epol mask the way w16 works 2019-02-06 19:34:26 +02:00
Pauli Oikkonen aa19bcac8a Avoid branching in creating shuffle mask in hor_sad_w16 2019-02-06 18:58:46 +02:00
Pauli Oikkonen 2d05ca8520 Remove width from constant-width hor_sad func params
They should kinda know it already
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 57db234d95 Move 32-wide SSE4.1 hor_sad to picture-sse41.c
It's not used by picture-avx2.c that also includes the header, so
it should not be in the header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen f5ff4db01f 4-wide hor_sad border agnostic 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 35e7f9a700 Fix hor_sad w8 to work with both borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 69687c8d24 Modify hor_sad_sse41_w16 to work over left and right borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 686fb2c957 Unroll arbitrary-width SSE4.1 hor_sad by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 768203a2de First version of arbitrary-width SSE4.1 hor_sad 2019-02-04 20:41:40 +02:00
Pauli Oikkonen ccf683b9b6 Start work on left and right border aware hor_sad
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
2019-02-04 20:41:40 +02:00
Pauli Oikkonen c36482a11a Fix bug in 24-wide SAD
*facepalm*
2019-02-04 20:41:40 +02:00
Pauli Oikkonen f781dc31f0 Create strategy for ver_sad
Easy to vectorize
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 91cb0fbd45 Create strategy for directly obtaining pointer to constant-width SAD function 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 94035be342 Unify unrolling naming conventions 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 517a4338f6 Unroll SSE SAD for 8-wide blocks to process 4 lines at once 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 0f665b28f6 Unroll arbitrary width SSE4.1 SAD by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 84cf771dea Unroll 32 and 16 wide SAD vector implementations by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 5df5c5f8a4 Cast all pointers to const types in vector SAD funcs
Also tidy up the pointer arithmetic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen a711ce3df5 Inline fixed width vectorized SAD functions 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 4cb371184b Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 796568d9cc Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 2eaa7bc9d2 Move SSE4.1 SAD functions to separate header 2019-02-04 20:41:40 +02:00