Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes
for which we don't have AVX versions yet.
Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for:
--preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp
* Suite speed_tests:
-PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
-PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
The includes should make more sense now and not just happen to compile
due to headers included from other headers.
Used a modified version of IWYU. Modifications were to attribute int8_t
and so on to stdint.h instead of sys/types.h and immintrin.h instead of
more specific headers.
include-what-you-use 0.7 (git:b70df35)
based on clang version 3.9.0 (trunk 264728)
Add module information to all header files.
Update all header file documentations to briefly say what they are, and
to use the javadoc format so the brief actually gets included into the
doxygen documentation.
Remove \file from implementation files, in order to not repeat the info
from the header files.
Add files under strategies and tools to Doxygen and update the Doxygen
settings to be just plain better.
Make README be the main page of Doxygen documentation.
- Everyone who has contributed code to the project has been asked to license
their contributions under LPGL and they have agreed.
- COPYING file changed to say LGPLv2.1 instead of GPLv2.
- GPL changed to LGPL in the header of every single file that a header and
header added to the few that were missing one.
- Also.. Happy new year!
- To allow clang to compile, as it doesn't according to #58.
- The target attributes are not needed anymore due to makefile handling
targetting now.
- The __attribute__((unused)) used for debugging. I don't know if clang
supports this attribute or not but it doesn't seem very important so
I'm removing it just in case.
- This is necessary in order to compile AVX intrinsics correctly in
Visual Studio. Having everything in their own units should also make
compiling normal C code with optimizations on easier.
- For now the makefile still relies on GCC __target__ attribute for compiling
intrinsics.