mirror of
https://github.com/ultravideo/uvg266.git
synced 2024-11-24 10:34:05 +00:00
14a7bcba25
Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes for which we don't have AVX versions yet. Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for: --preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp * Suite speed_tests: -PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) -PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec) |
||
---|---|---|
.. | ||
picture-sse41.c | ||
picture-sse41.h |