Pauli Oikkonen
|
6d43759604
|
Create a border-respecting 32-wide AVX hor_sad
|
2019-03-07 18:01:22 +02:00 |
|
Pauli Oikkonen
|
f218cecb38
|
Remove offending hor_sad_avx2_w32 function
Consider possibly creating a non-offending AVX2 version instead, the
way hor_sad_sse41_w32 works. Or maybe there's more essential work to
do.
|
2019-03-05 22:51:41 +02:00 |
|
Pauli Oikkonen
|
d8b8923028
|
Add LGPL notices to reg_sad headers
|
2019-02-18 17:52:47 +02:00 |
|
Pauli Oikkonen
|
2d05ca8520
|
Remove width from constant-width hor_sad func params
They should kinda know it already
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
dd7d989a39
|
Implement 32-wide hor_sad on AVX2
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
cbca3347b5
|
Unroll 64-wide AVX2 SAD by 2
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
84cf771dea
|
Unroll 32 and 16 wide SAD vector implementations by 4
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
a711ce3df5
|
Inline fixed width vectorized SAD functions
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
6504145cce
|
Remove 16-pixel wide AVX2 SAD implementation
At least on Skylake, it's noticeably slower than the very simple
version using SSE4.1
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
4cb371184b
|
Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
796568d9cc
|
Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64
|
2019-02-04 20:41:40 +02:00 |
|
Pauli Oikkonen
|
4d45d828fa
|
Use constant-width SSE4.1 SAD funcs for AVX2
|
2019-02-04 20:41:40 +02:00 |
|