- Moved implementations for different sizes to inline functions that are
defined using each other, reducing the amount of redundant code.
- Performance of sad_8bit_32x32_avx2 improved by about 10% due to unrolling of
the loop.
-Removed unnecessary <math.h> headers
-Updated AVX/asm optimizations to match the new file hierarchy
-Makefile only compiles .asm files if KVAZAAR_DISABLE_YASM is not set to 1 and TARGET_CPU_ARCH is x86
-Added vzeroupper to satd macro to prevent AVX-SSE transition penalties int picture_x86.asm
-Fixed the order of registers in zero extend macro in picture_x86.asm
-Fixed SATD checkers test pattern in satd_tests.c
- This is necessary in order to compile AVX intrinsics correctly in
Visual Studio. Having everything in their own units should also make
compiling normal C code with optimizations on easier.
- For now the makefile still relies on GCC __target__ attribute for compiling
intrinsics.
- Enforces a little bit more hierarchy. Compilation units are in strategies
and whatever inline includes they have are in a folder with the same name
as the strategy.
- _M_IX86_FP defines whether VS should generate code using SSE or SSE2
instructions. It isn't correct to use it to check whether optional runtime
optimizations should be compiled in. It's also not defined at all in 64-bit
mode.
- So let's just keep it simple and give a list of everything that is supported
as release optimizations. It's not clear from the documentation if all of
these are really supported. It just list a bunch of intrinsics from these
that are.