Pauli Oikkonen
|
f5ff4db01f
|
4-wide hor_sad border agnostic
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
35e7f9a700
|
Fix hor_sad w8 to work with both borders
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
69687c8d24
|
Modify hor_sad_sse41_w16 to work over left and right borders
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
686fb2c957
|
Unroll arbitrary-width SSE4.1 hor_sad by 4
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
768203a2de
|
First version of arbitrary-width SSE4.1 hor_sad
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
ccf683b9b6
|
Start work on left and right border aware hor_sad
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
c36482a11a
|
Fix bug in 24-wide SAD
*facepalm*
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
f781dc31f0
|
Create strategy for ver_sad
Easy to vectorize
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
91cb0fbd45
|
Create strategy for directly obtaining pointer to constant-width SAD function
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
94035be342
|
Unify unrolling naming conventions
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
517a4338f6
|
Unroll SSE SAD for 8-wide blocks to process 4 lines at once
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
0f665b28f6
|
Unroll arbitrary width SSE4.1 SAD by 4
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
84cf771dea
|
Unroll 32 and 16 wide SAD vector implementations by 4
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
5df5c5f8a4
|
Cast all pointers to const types in vector SAD funcs
Also tidy up the pointer arithmetic
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
a711ce3df5
|
Inline fixed width vectorized SAD functions
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
4cb371184b
|
Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
796568d9cc
|
Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
2eaa7bc9d2
|
Move SSE4.1 SAD functions to separate header
|
2019-02-04 20:41:40 +02:00 |
|