Ari Koivula
fcce6ae823
Fix printing of AVX2 capability.
2014-06-14 01:24:19 +03:00
Ari Koivula
a49ba2633a
Add OS and CPU detection for AVX2 and AVX.
2014-06-13 16:57:53 +03:00
Ari Koivula
1de102be61
Move strategies to their own compilation units.
...
- Enforces a little bit more hierarchy. Compilation units are in strategies
and whatever inline includes they have are in a folder with the same name
as the strategy.
2014-06-13 15:30:23 +03:00
Ari Koivula
aa3549a717
Change SLEEP(0) to SLEEP(10) on Windows.
...
- This is a workaround for a performance problem on Windows where main thread
is busy looping.
2014-06-13 12:01:03 +03:00
Laurent Fasnacht
4acadccf89
Only signal the required number of threads
2014-06-13 08:34:59 +02:00
Laurent Fasnacht
70ce7cec20
Remove unneccessary locks by adding threadqueue->queue_running counter
2014-06-13 08:34:58 +02:00
Laurent Fasnacht
7ef34ff5a1
Ability to dump mutex_lock, mutex_unlock and cond_wait timing, if compiled with -D_PTHREAD_DUMP
2014-06-13 08:32:14 +02:00
Laurent Fasnacht
68ad323e84
Tentative fix for race condition
2014-06-12 14:01:33 +02:00
Laurent Fasnacht
b194e19708
Tentative fix for deadlock
2014-06-12 12:57:14 +02:00
Laurent Fasnacht
b765eca153
Remove unneeded encoder_state_blit_pixels
2014-06-12 11:47:46 +02:00
Laurent Fasnacht
da07b8b35d
No-copy works (SAO and deblocking enabled)
2014-06-12 11:47:38 +02:00
Laurent Fasnacht
2cc700fab8
No-copy works with --no-sao (deblocking enabled)
2014-06-12 11:47:31 +02:00
Laurent Fasnacht
6b408b5904
No-copy works with --no-sao --no-deblock
2014-06-12 11:47:30 +02:00
Laurent Fasnacht
0dbfa62698
Replace copy of images made for tiles by sub-images (no copy)
...
- replace width by stride where required in the source code
2014-06-12 11:47:30 +02:00
Laurent Fasnacht
b1347efef5
Add checkpoint in sao_reconstruct
2014-06-12 11:47:29 +02:00
Laurent Fasnacht
ae4dc4eb44
Fix uninitialized sao_info structure members, which was creating false positive when checkpointing SAO
2014-06-12 11:47:29 +02:00
Laurent Fasnacht
f371bdafc3
sao_info checkpoints
2014-06-12 11:47:28 +02:00
Laurent Fasnacht
b7fe81c55c
Checkpoint in pixels_blit, and avoid doing undefined behaviour when source and destination is the same.
...
Seems a reasonnable point to observe when refactoring, since it's called on most image data.
2014-06-12 11:47:28 +02:00
Laurent Fasnacht
da8559fa34
Fix bug in CHECKPOINTS_FINALIZE() when checkpoints are disabled
2014-06-12 11:47:27 +02:00
Laurent Fasnacht
14df6de0d0
Checkpoint on frame checksum
2014-06-12 11:47:00 +02:00
Laurent Fasnacht
22df7cf98b
Use an assert instead of a dumb assignment
2014-06-12 11:47:00 +02:00
Laurent Fasnacht
cf123e317f
Code to checkpoint cu_info and lcu_t
2014-06-12 11:47:00 +02:00
Ari Koivula
ea830d3dd2
Add warning for VLAs in Makefile.
2014-06-12 09:57:08 +03:00
Ari Koivula
443f2f00aa
Fix compilation for VS.
...
- VS2013 does not support variable length arrays.
2014-06-11 17:51:55 +03:00
Laurent Fasnacht
87ed365053
typo fix
2014-06-11 10:29:05 +02:00
Laurent Fasnacht
6ca30367f9
Fix POC bug
2014-06-11 10:29:05 +02:00
Laurent Fasnacht
8437229885
Fix handling of cu_arrays
2014-06-11 10:29:04 +02:00
Laurent Fasnacht
e1d9cb015a
Basic checkpointing system
2014-06-11 10:29:03 +02:00
Laurent Fasnacht
27a49d287d
Big refactor to use videoframe, image_list, and image instead of picture*
2014-06-10 09:19:06 +02:00
Laurent Fasnacht
530faf3951
Move video frame related stuff to videoframe
2014-06-05 14:08:31 +02:00
Laurent Fasnacht
0fac77f9eb
Image now in separate module
2014-06-05 14:04:12 +02:00
Laurent Fasnacht
2456c65822
Replace accesses to picture->cu_array with picture_get_cu and picture_get_cu_const
2014-06-05 10:41:58 +02:00
Laurent Fasnacht
821b71910b
Move picture_list to its own module
2014-06-05 09:49:24 +02:00
Laurent Fasnacht
7372f9244d
Basic infrastructure for OWF
2014-06-05 09:09:25 +02:00
Laurent Fasnacht
16e3a58359
Performance improvement
2014-06-05 06:57:51 +02:00
Laurent Fasnacht
bad6d45e5f
Performance improvement
2014-06-05 06:57:51 +02:00
Laurent Fasnacht
aad2089fcf
Use -ftree-vectorize
2014-06-05 06:57:50 +02:00
Laurent Fasnacht
ea04bcd6a4
AltiVec support for SAD
2014-06-05 06:57:34 +02:00
Ari Koivula
3a7147baf4
Merge branch 't-20140602'
2014-06-04 18:11:15 +03:00
Ari Koivula
31b1bbc215
Address implicit declaration of warnings.
2014-06-04 18:00:50 +03:00
Ari Koivula
4f5c87fc5e
Remove duplicate function definition.
2014-06-04 17:56:05 +03:00
Ari Koivula
cb7d7f9e15
Update Makefile.
2014-06-04 17:52:28 +03:00
Ari Koivula
bb47534b88
Make encoder_state .c files their own compilation units.
...
- It's good that this module has been chopped to smaller pieces, but lets
avoid including .c files unless we really have to. These make pretty good
submodules on their own so just make them their own compilation units.
- Move some stuff around to avoid having to forward declare them
in encoderstate.c.
2014-06-04 17:45:18 +03:00
Ari Lemmetti
9e649a8f38
Updated usage message
2014-06-04 15:23:27 +03:00
Laurent Fasnacht
b8acdc784a
Fix compilation of encoder.c with -D_DEBUG
2014-06-03 15:02:14 +02:00
Laurent Fasnacht
961da05235
Split encoderstate.c in multiple files
2014-06-03 14:47:49 +02:00
Laurent Fasnacht
3d07f8cc84
encoderstate refactor
2014-06-03 14:25:16 +02:00
Laurent Fasnacht
2e821b79a9
encoder_state in now in encoder_state.[ch]
2014-06-03 13:51:30 +02:00
Laurent Fasnacht
9bdecbe071
Better thread scheduling
2014-06-03 11:39:16 +02:00
Laurent Fasnacht
0811dbcfbe
Remove unneeded cond_broadcast. Limit contention
2014-06-03 09:45:17 +02:00
Laurent Fasnacht
5ee1319c08
Altivec detection
2014-06-03 07:55:39 +02:00
Laurent Fasnacht
58ad3b4d26
Log more performance data, plot also now many threads are running
2014-06-03 07:42:22 +02:00
Laurent Fasnacht
5ed69b063b
Strategy selector for array_checksum, basic implementation using precomputed 256*256 block with larger accesses than byte
2014-06-03 07:42:22 +02:00
Ari Koivula
a483e8cb0f
Move cpuid stuff away from compiler namespace.
...
Conflicts:
src/strategyselector.c
2014-05-30 10:08:14 +03:00
Marko Viitanen
6a72f87028
Merge commit '792a5a5dd1946a327f22b2daba05c6645dfa8037'
2014-05-30 08:47:01 +03:00
Marko Viitanen
792a5a5dd1
Small fix for __get_cpuid()
2014-05-30 08:37:03 +03:00
Laurent Fasnacht
642564b6fb
Remove unused variable
2014-05-28 15:04:45 +02:00
Laurent Fasnacht
4f86919d75
Get rid of assembly cpuid for x86, compilation works for powerpc
2014-05-28 15:04:00 +02:00
Ari Koivula
e585da37e5
Give correct transform depth to RDOQ.
...
Conflicts:
src/search.c
2014-05-28 15:47:49 +03:00
Ari Koivula
dceb3da9b8
Fix bug in search relating to transform with no non-zero coefficients.
...
- Because cost was calculated even though there were no coefficients, these
very good modes were less likely to be selected.
- Added assert to encode_coeff_nxn to avoid these problems in the future.
2014-05-28 15:22:18 +03:00
Ari Koivula
ddc02cc09e
Avoid regenerating reference pixels for every rdo mode.
2014-05-22 13:18:28 +03:00
Ari Koivula
dbe13d0cba
Separate sad intra search from rdo search.
2014-05-22 12:47:45 +03:00
Ari Koivula
19ce21e07c
Split final cost to luma and chroma functions.
2014-05-22 09:45:00 +03:00
Ari Koivula
a6962e2974
Separate intra transform coding to luma and chroma functions.
2014-05-22 09:40:34 +03:00
Laurent Fasnacht
3a30a886fc
FREE_POINTER of job->rdepends was at the wrong place (memory leak)
2014-05-22 07:15:18 +02:00
Laurent Fasnacht
3b38777b71
Fix condition depending on uninitialized value in SAO
2014-05-21 16:33:24 +02:00
Laurent Fasnacht
66e730ba94
Fix encoder_state_init, which was making out of bound reads
2014-05-21 14:23:36 +02:00
Laurent Fasnacht
37c20b8ce5
Add dependency between SAO rows
2014-05-21 13:52:56 +02:00
Laurent Fasnacht
90f46dc56f
Threadqueue has now a start index to the first queue job. It improves the speed a little
2014-05-21 12:02:55 +02:00
Laurent Fasnacht
f4f9093cb5
Parallel SAO
2014-05-21 11:48:29 +02:00
Laurent Fasnacht
a3fcb141ed
lcu_order_element now has pointer to neighbor LCUs
2014-05-21 11:06:53 +02:00
Ari Koivula
de76d0a294
Don't add dependency to the above LCU in wavefront if it's not necessary.
...
- The top-right LCU already has dependency to the top LCU.
2014-05-20 10:48:19 +03:00
Laurent Fasnacht
bdc2d43180
Write bitstream directly after doing the search. This is required since we need the correct entropy status for wpp
2014-05-20 09:29:01 +02:00
Laurent Fasnacht
06532292fc
Wavefront are in tile coordinates
2014-05-20 09:28:58 +02:00
Ari Koivula
4751a3744b
Fix intra mode search not doing boundary smoothing for DC.
...
- Move the boundary smoothing to the prediction function to make sure it's not
forgotten.
2014-05-19 16:23:17 +03:00
Ari Koivula
f9a603e4ea
Move intra mode search form intra module to search module.
...
- Make the actual intra prediction function global.
- Move the rdo stuff to rdo module.
2014-05-19 16:12:02 +03:00
Ari Koivula
1da94f2085
Stop deblocking from filtering edges not on 8x8 grid.
2014-05-19 15:58:54 +03:00
Ari Koivula
2224e18a46
Make deblocking work with transform splits.
...
- It used to work only with the implicit transform split from LCU size.
2014-05-19 15:58:54 +03:00
Ari Koivula
656b0a321b
Add chroma mode to lcu_set_intra_mode.
...
- This is needed for intra split.
2014-05-19 15:58:54 +03:00
Ari Koivula
921f58b249
Add tr_split to lcu_set_intra_mode.
2014-05-19 15:58:54 +03:00
Ari Koivula
846b608125
Add transform split recursion to intra reconstruction.
2014-05-19 15:58:54 +03:00
Ari Koivula
63f6cad5a0
Include global.h in thread modules.
2014-05-19 15:58:16 +03:00
Ari Koivula
551b087b47
Remove bunch of unnecessary code from encode_transform_unit.
...
- Really, it's useless. Selecting scan order isn't this hard.
- Checked from HM that ctx_idx doesn't have anything to do with contexts.
2014-05-16 17:42:40 +03:00
Ari Koivula
f73bef0941
Remove unused include.
2014-05-16 16:09:59 +03:00
Laurent Fasnacht
6fdb821b14
Fix memory leaks
2014-05-16 12:20:40 +02:00
Laurent Fasnacht
d4a6aed471
Multi-row jobs
2014-05-16 12:20:40 +02:00
Marko Viitanen
94285fbed7
Fixed compiling on visual studio with _DEBUG defined
2014-05-16 12:22:06 +03:00
Marko Viitanen
86155ef1ba
Added windows specific timing macros for thread debugging
2014-05-16 12:16:22 +03:00
Laurent Fasnacht
36945e89ce
Stubs to be able to make a portable version of the profiling
2014-05-16 10:15:05 +02:00
Laurent Fasnacht
53b0835316
Improve handling of jobs when not using threads
2014-05-16 08:50:43 +02:00
Laurent Fasnacht
519750d630
Write bitstream of a wavefront in a parallel way
2014-05-16 08:50:42 +02:00
Laurent Fasnacht
7473ac1bfc
Able to log time in a simple way
2014-05-16 08:50:42 +02:00
Laurent Fasnacht
86e01284b8
Add -lrt
2014-05-16 08:48:54 +02:00
Laurent Fasnacht
4f73a7fc91
Instrument threads in order to be able to do some visualization
2014-05-16 08:44:32 +02:00
Ari Koivula
a7cd31d87b
Update the names of some bins to the current spec.
...
- Helps with debugging.
2014-05-16 05:44:03 +03:00
Ari Koivula
ab4041c8fc
Change cabac debug statements to show information better.
...
- Show the number of bits when encoding multiple bins. I would like just the
bits them selves in string form, but that's too much trouble for this.
- Print then as unsigned and coerce them to unsigned, as they are going
get coerced to unsigned by the function call anyway.
- Change state to be less verbose.
2014-05-16 05:44:03 +03:00
Ari Koivula
c9a8756fbd
Fix NxN scan mode for lcu_get_final_cost.
...
- Scan mode was always selected according to the first PU mode.
2014-05-15 16:20:35 +03:00
Marko Viitanen
b08047cce9
Fixed intra chroma mode selection
2014-05-15 09:50:05 +03:00
Ari Koivula
f0e990905e
Remove chroma mode "36".
...
- It's an unnecessary chore to handle this special case everywhere (it means
chroma_mode == intra_mode). Better just to use the actual mode.
2014-05-14 19:56:35 +03:00
Ari Koivula
60a0ba4280
Update VS project files to link win32-pthread.
...
- I haven't found a good way of including external dependencies to VS projects
yet. Win32-pthreads is assumed to be found at the same level as kvazaar dir
and has the files x86/pthreadVC2.lib and x64/pthreadVC2.lib.
- Win32-pthreads also requires the pthreadVC2.dll to be in PATH when running
the program. Not sure what to do about that yet. We might need an installer
for windows to handle that.
- Disable openmp as it's no longer used.
- Stop linking Ws2_32.lib as that hasn't been used for ages.
2014-05-14 17:54:34 +03:00