Commit graph

17 commits

Author SHA1 Message Date
Pauli Oikkonen 35e7f9a700 Fix hor_sad w8 to work with both borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 69687c8d24 Modify hor_sad_sse41_w16 to work over left and right borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 686fb2c957 Unroll arbitrary-width SSE4.1 hor_sad by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 768203a2de First version of arbitrary-width SSE4.1 hor_sad 2019-02-04 20:41:40 +02:00
Pauli Oikkonen ccf683b9b6 Start work on left and right border aware hor_sad
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
2019-02-04 20:41:40 +02:00
Pauli Oikkonen c36482a11a Fix bug in 24-wide SAD
*facepalm*
2019-02-04 20:41:40 +02:00
Pauli Oikkonen f781dc31f0 Create strategy for ver_sad
Easy to vectorize
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 91cb0fbd45 Create strategy for directly obtaining pointer to constant-width SAD function 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 94035be342 Unify unrolling naming conventions 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 517a4338f6 Unroll SSE SAD for 8-wide blocks to process 4 lines at once 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 0f665b28f6 Unroll arbitrary width SSE4.1 SAD by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 84cf771dea Unroll 32 and 16 wide SAD vector implementations by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 5df5c5f8a4 Cast all pointers to const types in vector SAD funcs
Also tidy up the pointer arithmetic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen a711ce3df5 Inline fixed width vectorized SAD functions 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 4cb371184b Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 796568d9cc Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 2eaa7bc9d2 Move SSE4.1 SAD functions to separate header 2019-02-04 20:41:40 +02:00