Pauli Oikkonen
|
448eacba7b
|
Avoid overreading block borders in hor_sad_sse41_arbitrary
|
2019-03-05 22:34:50 +02:00 |
|
Pauli Oikkonen
|
9b0e079262
|
Use SSE instructions for 64-bit SADs instead of MMX
VC++ seems to choke on MMX instructions
|
2019-02-18 20:13:33 +02:00 |
|
Pauli Oikkonen
|
d8b8923028
|
Add LGPL notices to reg_sad headers
|
2019-02-18 17:52:47 +02:00 |
|
Pauli Oikkonen
|
770db825b9
|
Create hor_sad_w8 and w4 epol mask the way w16 works
|
2019-02-06 19:34:26 +02:00 |
|
Pauli Oikkonen
|
aa19bcac8a
|
Avoid branching in creating shuffle mask in hor_sad_w16
|
2019-02-06 18:58:46 +02:00 |
|
Pauli Oikkonen
|
2d05ca8520
|
Remove width from constant-width hor_sad func params
They should kinda know it already
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
57db234d95
|
Move 32-wide SSE4.1 hor_sad to picture-sse41.c
It's not used by picture-avx2.c that also includes the header, so
it should not be in the header
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
f5ff4db01f
|
4-wide hor_sad border agnostic
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
35e7f9a700
|
Fix hor_sad w8 to work with both borders
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
69687c8d24
|
Modify hor_sad_sse41_w16 to work over left and right borders
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
686fb2c957
|
Unroll arbitrary-width SSE4.1 hor_sad by 4
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
768203a2de
|
First version of arbitrary-width SSE4.1 hor_sad
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
ccf683b9b6
|
Start work on left and right border aware hor_sad
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
c36482a11a
|
Fix bug in 24-wide SAD
*facepalm*
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
f781dc31f0
|
Create strategy for ver_sad
Easy to vectorize
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
91cb0fbd45
|
Create strategy for directly obtaining pointer to constant-width SAD function
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
94035be342
|
Unify unrolling naming conventions
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
517a4338f6
|
Unroll SSE SAD for 8-wide blocks to process 4 lines at once
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
0f665b28f6
|
Unroll arbitrary width SSE4.1 SAD by 4
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
84cf771dea
|
Unroll 32 and 16 wide SAD vector implementations by 4
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
5df5c5f8a4
|
Cast all pointers to const types in vector SAD funcs
Also tidy up the pointer arithmetic
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
a711ce3df5
|
Inline fixed width vectorized SAD functions
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
4cb371184b
|
Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
796568d9cc
|
Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
2eaa7bc9d2
|
Move SSE4.1 SAD functions to separate header
|
2019-02-04 20:41:40 +02:00 |
|