Ari Lemmetti
47e3bcfb50
Fixed incorrect shift values for inverse transforms in generic strategy
2014-08-07 16:01:30 +03:00
Ari Lemmetti
709520a233
Removed all AVX2 instructions from SATD functions.
...
-Zero extend macro now returns results in 2 xmm registers instead of one ymm
2014-07-31 13:25:28 +03:00
Ari Lemmetti
0beb278f5b
Partial butterfly strategy is now called DCT strategy. Made changes to transform functions in preparation for optimizations.
...
-Moved fast_forward_dst and fast_inverse_dst to DCT strategies
2014-07-31 13:25:28 +03:00
Ari Lemmetti
6bf63bd171
Added AVX2 strategy for partial butterfly (no optimizations yet)
2014-07-31 13:25:28 +03:00
Ari Lemmetti
faccc4f09b
Partial butterfly functions now utilize the strategy selector
2014-07-31 13:25:28 +03:00
Ari Koivula
c2fac805d7
Give HAVE_ALIGNED_STACK to yasm on windows.
...
- Linux gets it through some other means but on windows it needs to be
given explicitly.
- Fixes issue #78 .
2014-07-30 16:26:23 +03:00
Ari Koivula
669e99dd7f
Improve intra SAD AVX2 intrinsics.
...
- Moved implementations for different sizes to inline functions that are
defined using each other, reducing the amount of redundant code.
- Performance of sad_8bit_32x32_avx2 improved by about 10% due to unrolling of
the loop.
2014-07-25 15:59:55 +03:00
Ari Koivula
e00102f0ca
Compile asm optimizations only if yasm is present.
2014-07-23 14:57:40 +03:00
Ari Lemmetti
85fb0784e4
Fixed intendentation and added some empty lines for readability
2014-07-23 12:32:27 +03:00
Ari Lemmetti
bd6e89c1f0
Updated include directories and file names to Makefile
2014-07-22 15:36:54 +03:00
Ari Lemmetti
4f88ebce5a
Added comments and made visual studio not to compile x86inc.asm
2014-07-22 15:07:57 +03:00
Ari Koivula
cfd3636e08
Move some repetitive SATD asm into a macro.
...
Conflicts:
src/strategies/x86_avx/picture_x86.asm
2014-07-22 12:46:39 +03:00
Ari Lemmetti
c81639dd09
Removed old unused macro
2014-07-22 11:11:20 +03:00
Ari Lemmetti
cf0797cafd
Reordered and intended assembly code
2014-07-22 11:07:42 +03:00
Ari Lemmetti
fea44c8234
Renaming AVX/asm files
...
-Splitted SAD and SATD functions in separate files
2014-07-21 18:02:01 +03:00
Ari Lemmetti
a64df6f0d0
Merge branch 'asm'
...
Conflicts:
build/kvazaar_lib/kvazaar_lib.vcxproj.filters
src/Makefile
src/strategies/strategies-picture.c
2014-07-21 16:41:09 +03:00
Ari Lemmetti
1be2c3aae5
Preparing push to master and misc
...
-Removed unnecessary <math.h> headers
-Updated AVX/asm optimizations to match the new file hierarchy
-Makefile only compiles .asm files if KVAZAAR_DISABLE_YASM is not set to 1 and TARGET_CPU_ARCH is x86
2014-07-21 12:39:56 +03:00
Ari Koivula
a8f7103797
Add AVX2 implementations for sad_8bit_ 8x8, 16x16 and 32x32.
2014-07-18 18:27:30 +03:00
Ari Koivula
3daa5dd1f1
Add sse2 implementaton for sad_8bit_4x4.
2014-07-18 18:20:34 +03:00
Ari Koivula
f49332c9b8
Add missing includes.
2014-07-18 17:56:15 +03:00
Ari Koivula
291817667f
Tidy up the Makefile.
2014-07-18 17:31:18 +03:00
Ari Koivula
e241866f43
Compile intrinsic functions with appropriate flags in gcc.
...
- Remove -march=native as it's no longer necessary for intrinsics to work.
Closes #77 .
- I couldn't test altivec or sse4.1, but sse4.1 compiles so I expect it
to work.
2014-07-18 17:28:14 +03:00
Ari Koivula
5662621b3c
Free threadqueue jobs when they are not needed.
...
- Also add destroying the mutex when the job is freed.
- This makes Kvazaar no longer acquire thousands of OS handles on Windows.
2014-07-16 16:51:20 +03:00
Ari Lemmetti
1e94262f85
Made AVX asm compatible with the changed system
...
- x86inc.asm is now located in extras
- Removed unused cpu.asm/h
2014-07-14 18:51:17 +03:00
Ari Lemmetti
683eda1183
Merge branch 'master' into asm
...
Conflicts:
build/kvazaar_lib/kvazaar_lib.vcxproj
build/kvazaar_lib/kvazaar_lib.vcxproj.filters
src/Makefile
src/strategies/strategies-picture.c
2014-07-14 16:42:33 +03:00
Ari Lemmetti
7f873e037c
Updated Makefile to compile picture_x86.asm
2014-07-14 15:30:08 +03:00
Ari Lemmetti
2169f9ab8c
Added AVX asm comments and fixes
...
-Added vzeroupper to satd macro to prevent AVX-SSE transition penalties int picture_x86.asm
-Fixed the order of registers in zero extend macro in picture_x86.asm
-Fixed SATD checkers test pattern in satd_tests.c
2014-07-14 14:43:36 +03:00
Ari Koivula
5d0df56c94
Move optimizations to their own compilation units according to target.
...
- This is necessary in order to compile AVX intrinsics correctly in
Visual Studio. Having everything in their own units should also make
compiling normal C code with optimizations on easier.
- For now the makefile still relies on GCC __target__ attribute for compiling
intrinsics.
2014-07-11 17:26:19 +03:00
Ari Koivula
f605d6c35b
Align intra buffers to 32 bytes for 256 bit SIMD instructions.
2014-07-11 17:26:19 +03:00
Ari Koivula
fbd03b706e
Reconfigure VS project.
...
- Moved compilation flag stuff from project file to the abstraction layer.
- Disabled randomized base address as unnecessary.
- Disable stack buffer security check from release.
2014-07-11 17:26:19 +03:00
Laurent Fasnacht
72abc69b3d
Measure time for SAD in _DEBUG mode
2014-07-08 11:42:58 +02:00
Laurent Fasnacht
1a318c714d
log poc with new_frame
2014-07-08 11:42:19 +02:00
Laurent Fasnacht
e64a692780
Add CU type in threadqueue.log
2014-07-08 09:06:31 +02:00
Laurent Fasnacht
abfbb7cad3
Fix duplicate type key in threadqueue.log
2014-07-07 11:36:50 +02:00
Laurent Fasnacht
946e3b9651
Log search_cu to threadqueue.log
2014-07-07 10:50:05 +02:00
Laurent Fasnacht
f62e571c15
Add missing info to threadqueue.log
2014-07-07 10:49:40 +02:00
Ari Lemmetti
048127c7e3
AVX assembly optimizations improved
2014-07-02 16:57:06 +03:00
Ari Koivula
7ecf78bb70
Use sqrt lambda cost for searches not using SSD.
...
- Add encoder_state->global->cur_lambda_cost_sqrt.
- Use sqrt lambda for inter search and rough intra search.
- The effect on inter is around 10-20% bdrate. The effect on intra is smaller
and non-existent when --rd=2 is enabled, as the intra search refinement was
already done with SSD and correct lambda.
2014-06-26 13:56:38 +03:00
Laurent Fasnacht
1112dca933
Fix compilation issue with assertion disabled
2014-06-26 07:31:37 +02:00
Laurent Fasnacht
9ab9defe67
Bitstream length per frame works again
2014-06-19 10:24:03 +02:00
Laurent Fasnacht
45faadb2c9
Fix bug where the wrong number of frames could be encoded (if one frame takes longer than the others)
2014-06-19 10:24:02 +02:00
Ari Koivula
d5a77be4b8
Fix avx detection for gcc.
...
- GCC doesn't support _xgetbv intrinsic so we have to use inline assembler.
2014-06-18 11:50:17 +03:00
Ari Lemmetti
bdef5384ef
Added AVX strategy
2014-06-17 16:52:24 +03:00
Ari Koivula
d7abe6a7c2
Address compilation warning.
...
strategyselector.c:170:10: error: ‘__get_cpuid’ is static but used in inline function ‘get_cpuid’ which is not static [-Werror]
return __get_cpuid(level, eax, ebx, ecx, edx);
2014-06-17 16:26:55 +03:00
Ari Koivula
60ecc6baae
Remove unused stuff.
2014-06-17 16:20:01 +03:00
Ari Koivula
7532b789f8
Add -std=gnu99 for gcc.
...
- std=c99 doesn't work because then struct timespec won't be defined.
2014-06-17 16:15:39 +03:00
Ari Koivula
94bc457b6c
Add option to disable fast intra search.
2014-06-17 15:32:05 +03:00
Ari Koivula
e27fc875c0
Clean up intra search.
2014-06-17 15:09:12 +03:00
Ari Koivula
e4d70ac1ab
Use more starting points for smaller blocks in intra search.
2014-06-17 13:28:27 +03:00
Ari Koivula
9911c7553b
Avoid unnecessary intra dir searching.
2014-06-17 13:11:35 +03:00
Ari Koivula
bd16a55b9b
Always check DC and planar intra modes.
...
- At least one of them is always in predicted modes, but to make sure they
are both included add them explicitly.
2014-06-17 12:51:15 +03:00
Ari Koivula
70740da123
Add smarter rough intra search.
...
- Directional intra mode search is done using halving search from the best
known mode. Starting modes are vertical, horizontal and the 3 diagonal
modes.
Conflicts:
src/search.c
2014-06-17 12:33:10 +03:00
Marko Viitanen
0e2fe9e7ff
Changed intra search to skip some modes speeding it up
2014-06-17 12:32:29 +03:00
Marko Viitanen
a1c3cfe944
Moved intra mode cost calculation to a function
...
Conflicts:
src/search.c
2014-06-17 12:32:29 +03:00
Marko Viitanen
eb7d46f9ef
Modify CU split cost.
2014-06-17 12:30:32 +03:00
Marko Viitanen
bfa37b876b
Conformance fix: set sps_max_dec_pic_buffering to correct value
2014-06-17 12:30:32 +03:00
Ari Koivula
b3c15b8f94
Merge branch 'owf'
2014-06-16 16:07:41 +03:00
Laurent Fasnacht
91de92134f
Constrain the search not to go under the LCU below if OWF is enabled
2014-06-16 14:27:56 +02:00
Laurent Fasnacht
ef9c2258e9
Fix frame counter and stats
2014-06-16 13:21:52 +02:00
Ari Koivula
153b1ee41f
Merge branch 'intra-sad-strategies'
2014-06-16 12:34:37 +03:00
Laurent Fasnacht
84d34c2655
Fix compilation on non-intel
2014-06-16 11:24:02 +02:00
Ari Koivula
3f00592b96
Separate strategyselector debug prints from _DEBUG.
...
- I only want to see the strategy stuff.
2014-06-16 12:15:19 +03:00
Ari Koivula
1c97a10a6d
Move intra SAD and SATD functions under strategies.
2014-06-16 12:13:41 +03:00
Laurent Fasnacht
4b4702819b
Also print encoding FPS
2014-06-16 11:10:11 +02:00
Laurent Fasnacht
2347574a8e
Fix problems revealed by valgrind
2014-06-16 11:10:09 +02:00
Laurent Fasnacht
28c3f22ba1
Fix possible freeze
2014-06-16 11:03:48 +02:00
Laurent Fasnacht
a96c742ad4
Fix depends for wpp+owf
2014-06-16 11:03:47 +02:00
Laurent Fasnacht
f99e41d41f
Improved CPU time statistics
2014-06-16 11:03:46 +02:00
Laurent Fasnacht
8a33c0a688
Fix recon job for wfrow
2014-06-16 10:55:01 +02:00
Laurent Fasnacht
bf6024734a
Fix statistics with OWF
2014-06-16 10:55:00 +02:00
Laurent Fasnacht
0522a3d8e5
--owf option
2014-06-16 10:55:00 +02:00
Laurent Fasnacht
47d1ded7b0
Dependencies between frames
2014-06-16 10:54:59 +02:00
Laurent Fasnacht
003d3c504c
image_list_copy_contents
2014-06-16 10:54:58 +02:00
Laurent Fasnacht
f4187dd10c
cu_array data structure
2014-06-16 10:54:57 +02:00
Laurent Fasnacht
3be3fa8d6e
Use different processing order depending if we have OWF or not
2014-06-16 10:54:56 +02:00
Laurent Fasnacht
c32943f78b
OWF
2014-06-16 10:54:56 +02:00
Laurent Fasnacht
490dd15f3d
Remove flush between frame
2014-06-16 10:51:33 +02:00
Laurent Fasnacht
fddcbabe28
bitstream writing is now a "normal" job in a thread
2014-06-16 10:51:32 +02:00
Laurent Fasnacht
ff7143cc24
Assign thread_queue_jobs and move image_free to a more suitable place
2014-06-16 10:51:32 +02:00
Ari Koivula
87ca828a63
Correct intra sad function labels.
...
- These haven't been 16 bit for a long time.
2014-06-16 10:45:10 +03:00
Ari Koivula
fcce6ae823
Fix printing of AVX2 capability.
2014-06-14 01:24:19 +03:00
Ari Koivula
a49ba2633a
Add OS and CPU detection for AVX2 and AVX.
2014-06-13 16:57:53 +03:00
Ari Koivula
1de102be61
Move strategies to their own compilation units.
...
- Enforces a little bit more hierarchy. Compilation units are in strategies
and whatever inline includes they have are in a folder with the same name
as the strategy.
2014-06-13 15:30:23 +03:00
Ari Koivula
aa3549a717
Change SLEEP(0) to SLEEP(10) on Windows.
...
- This is a workaround for a performance problem on Windows where main thread
is busy looping.
2014-06-13 12:01:03 +03:00
Laurent Fasnacht
4acadccf89
Only signal the required number of threads
2014-06-13 08:34:59 +02:00
Laurent Fasnacht
70ce7cec20
Remove unneccessary locks by adding threadqueue->queue_running counter
2014-06-13 08:34:58 +02:00
Laurent Fasnacht
7ef34ff5a1
Ability to dump mutex_lock, mutex_unlock and cond_wait timing, if compiled with -D_PTHREAD_DUMP
2014-06-13 08:32:14 +02:00
Laurent Fasnacht
68ad323e84
Tentative fix for race condition
2014-06-12 14:01:33 +02:00
Laurent Fasnacht
b194e19708
Tentative fix for deadlock
2014-06-12 12:57:14 +02:00
Laurent Fasnacht
b765eca153
Remove unneeded encoder_state_blit_pixels
2014-06-12 11:47:46 +02:00
Laurent Fasnacht
da07b8b35d
No-copy works (SAO and deblocking enabled)
2014-06-12 11:47:38 +02:00
Laurent Fasnacht
2cc700fab8
No-copy works with --no-sao (deblocking enabled)
2014-06-12 11:47:31 +02:00
Laurent Fasnacht
6b408b5904
No-copy works with --no-sao --no-deblock
2014-06-12 11:47:30 +02:00
Laurent Fasnacht
0dbfa62698
Replace copy of images made for tiles by sub-images (no copy)
...
- replace width by stride where required in the source code
2014-06-12 11:47:30 +02:00
Laurent Fasnacht
b1347efef5
Add checkpoint in sao_reconstruct
2014-06-12 11:47:29 +02:00
Laurent Fasnacht
ae4dc4eb44
Fix uninitialized sao_info structure members, which was creating false positive when checkpointing SAO
2014-06-12 11:47:29 +02:00
Laurent Fasnacht
f371bdafc3
sao_info checkpoints
2014-06-12 11:47:28 +02:00
Laurent Fasnacht
b7fe81c55c
Checkpoint in pixels_blit, and avoid doing undefined behaviour when source and destination is the same.
...
Seems a reasonnable point to observe when refactoring, since it's called on most image data.
2014-06-12 11:47:28 +02:00
Laurent Fasnacht
da8559fa34
Fix bug in CHECKPOINTS_FINALIZE() when checkpoints are disabled
2014-06-12 11:47:27 +02:00
Laurent Fasnacht
14df6de0d0
Checkpoint on frame checksum
2014-06-12 11:47:00 +02:00