Pauli Oikkonen
17947b79ee
Add sao_shared_generics.h in Makefile.am
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a8dd6ce351
Add a note about having implemented a separate AVX2 version of SAO offset array calculation
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a858e7dd4b
Combine duplicate code into inline functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
de0e97f711
Take 8/16/24b loads and stores into separate functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
10979f58fe
Tidy up code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
9cc11976c0
Combine the delta accumulation from edge and band ddistortion into shared func
...
This won't reduce object size, but there'll be less duplicate code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
55d877bd66
Vectorize sao_edge_ddistortion
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
aef0f301d3
Fix function signatures
...
Mark anything intended as read-only to be const, and fix alignment
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
997fd369b3
Redo calc_sao_edge_dir_avx2
...
Do it wider, 32 pixels at once!
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
db1e475e02
Use i32 instead of i8 for x/y offsets
...
Doesn't matter too much, because this number isn't used in SIMD
computation, only as a memory reference offset.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
12de466ef5
Reimplement non-band SAO color reconstruction in AVX2
...
Streamline things to work on 32 pixels at once instead of 8
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
e8bff99329
Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction
...
Vectorize it all, hope this helps with perf
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
7b5dffa855
Implement calc_sao_offset_array in AVX2
...
To be efficient, the AVX2 color reconstruction algorithm will need
offsets in byte, not dword, arrays. This is completely specific to 8-bit
pixels and the function signature is fundamentally distinct from the
generic algorithm, so it's better to not strategize SAO offset array
calculation.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
29563b7039
Make kvz_calc_sao_offset_array more obvious
...
Name temporary values from array lookups etc that are referred multiple
times to, to make the behavior of the mechanism more transparent. Define
all the constant values at the beginning of the function and declare as
const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
08881f5e9b
(TEMP) (TODO) (whatever) Avoid compiler warnings
...
I want the CI to not crash on its -Wall -Werror, but instead to actually
build the thing and report me about actual memory errors etc
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
c18adc5ee0
Redo sao_band_ddistortion_avx2
...
Avoid branching and do the entire thing on 32 pixels at once in YMMs.
Also make the sao_bands function parameter const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
2827c3e3ab
Make calc_sao_bands less opaque
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
1bb9a079a8
Fix indentation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7bc959c7c5
3 sao functions are now working
2019-08-07 16:35:24 +03:00
Reima Hyvönen
0e0f2d3490
made to clear sum vector after it has been set to memory
2019-08-07 16:35:24 +03:00
Reima Hyvönen
f146de7acb
removed some variables to prevent memory losses
2019-08-07 16:35:24 +03:00
Reima Hyvönen
247c3a7a71
conversed gined to unsigned int
2019-08-07 16:35:24 +03:00
Reima Hyvönen
ac5c216974
Some more memory error preventing to sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3fb1cbca35
more editing sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
afbb6fb960
some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3496a57f7a
Edited sao_edge_ddistortion_avx2 to avoid memory overflow
2019-08-07 16:35:24 +03:00
Reima Hyvönen
267ba1d6ce
Modified sao_band_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
e70663b245
added some sub commands to avoid memory read errors
2019-08-07 16:35:24 +03:00
Reima Hyvönen
59dfb4570c
Converted some loads to load int8_t instead ints
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8b253209a8
Found false address load from calc_sao_edge_dir. Should now work like generic
2019-08-07 16:35:24 +03:00
Reima Hyvönen
50e0a47b7a
Took away __restrict
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8a39eb674e
Removed c-variable from calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bc0a36830d
Clerified some 6 pixel loads
2019-08-07 16:35:24 +03:00
Reima Hyvönen
1a8b211e05
Added break to line 170
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d05e750ebe
Added some switches to prevent segmentation fault from reading
2019-08-07 16:35:24 +03:00
Reima Hyvönen
203580047d
Defined some AVX functions
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c884c738b1
Updated some commands to match the standard
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b412ed2f59
Removed some setr and used loads calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c6cc063534
converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract
2019-08-07 16:35:24 +03:00
Reima Hyvönen
47ac109b10
optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND
2019-08-07 16:35:24 +03:00
Reima Hyvönen
96dc60a1ed
first working optimation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c148aff9fb
Some optimation done to function sao_reconstruct_color_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bf16ba6cc4
Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
79dc39a676
Some editing for sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
06ee52924e
some reconst done to calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
5fbc65d823
reconst optimation doesn't work yet
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d29f834a69
Remove useless function
2019-08-07 16:35:24 +03:00
Reima Hyvönen
a232a12160
calc_sao_edge_dir_avx2 updated
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b1febc02a5
sao_edge_ddistortion_avx2 now working proberly
2019-08-07 16:35:24 +03:00
Reima Hyvönen
cd6092a1ec
Still too much bits, looking for where they appear
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7853be8eeb
Incomple optimation
2019-08-07 16:35:24 +03:00
Marko Viitanen
dfa5621024
Intrapred cleanup
2019-07-16 14:23:10 +03:00
Ari Lemmetti
40609aa865
Add missing headers to Makefile.am
2019-07-12 19:15:51 +03:00
Ari Lemmetti
5db3a78499
Bump versions for release 1.3
2019-07-09 22:09:32 +03:00
Ari Lemmetti
d513ab1999
Add missing newline
2019-07-09 21:06:05 +03:00
Ari Lemmetti
4967072625
Do not bypass search on skip cu if early_skip is not enabled
2019-07-09 20:20:12 +03:00
Ari Lemmetti
b20992a9f3
Rename functions more descriptive
2019-07-09 20:20:11 +03:00
Ari Lemmetti
a348a0ec23
Fix transform depth in early skip
2019-07-09 20:05:48 +03:00
Pauli Oikkonen
8d48bee180
Tidy fast coeff cost code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
201a43b08e
Clean up the RD-estimation code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
b111df5073
Create preliminary version of improved cost estimator
2019-07-09 18:01:54 +03:00
Ari Lemmetti
be08a87d94
Add missing parameter max-merge to the help message
2019-07-09 16:28:46 +03:00
Ari Lemmetti
d0bb9b4a6d
Add parameter max-merge to presets
2019-07-09 16:26:03 +03:00
Ari Lemmetti
4097331fd6
Early skip
2019-07-09 15:59:31 +03:00
Marko Viitanen
10d850e98a
Use index_offset in intra angular and change the offset to width+1
2019-07-08 14:23:19 +03:00
Marko Viitanen
3d1fa2a9cf
Fixing angular intra prediction reference pixels
2019-07-08 14:00:02 +03:00
Marko Viitanen
0656c54cab
Fix some problems with reference pixels in angular intra prediction kvz_angular_pred_generic()
2019-07-05 15:54:51 +03:00
Marko Viitanen
89ca2d4ba1
Use correct type for modedisp2sampledisp array
2019-07-05 14:12:10 +03:00
Marko Viitanen
2e8a0d08f9
Fix mvp_idx_model initialization and use
2019-07-05 14:11:29 +03:00
Joose Sainio
977e885ea2
Fix issue with gop=0 introduced in 1c36f68d0c
2019-07-05 12:57:27 +03:00
Marko Viitanen
c6217e236f
Enable 4-tap filtering for the intra angular
2019-07-04 16:26:10 +03:00
Marko Viitanen
cda6d951c0
Change DCT arrays back to 8-bit -> some frames are now correct
2019-07-04 15:59:10 +03:00
Marko Viitanen
8280bd3217
Add channel info to angular_pred and fix the displacement tables.
...
Also includes 4-tap intra filtering code commented out
2019-07-04 09:35:47 +03:00
Marko Viitanen
5e4369d6b0
Fix the kvz_cabac_encode_aligned_bins_ep function -> cabac coding now correct
2019-07-03 15:55:52 +03:00
Marko Viitanen
3fad4b0a98
Disable kvz_cabac_encode_aligned_bins_ep for now and add a ToDo message
2019-07-03 15:44:35 +03:00
Sami Ahovainio
ce1e67cc3a
Modified header flags to match VTM commit b9080ff45bec368c44f0c43a32dcd6804ef9f5d6
2019-07-01 13:58:15 +03:00
Sami Ahovainio
3863064d90
Fixed bugs in split decision and coefficient coding.
2019-07-01 13:00:43 +03:00
Mikko Pitkänen
a7f09c8114
Merge branch 'threadwrapper'
2019-06-24 16:54:59 +03:00
Sami Ahovainio
db5c0230e5
Fixed coefficient sign hiding
2019-06-20 16:26:01 +03:00
Sami Ahovainio
b51254cafd
Fixed significant coefficient group context calculation
2019-06-20 15:47:13 +03:00
Sami Ahovainio
5e0bea962c
Fixed split context decision
2019-06-20 15:30:49 +03:00
Sami Ahovainio
12322144f0
Removed debug print from context.c
2019-06-20 15:18:22 +03:00
Sami Ahovainio
3a9800d07d
Fixed coefficient coding. Fixed headers to match VTM commit e65075531471a68632bc9252d607655a0feeabc6
2019-06-20 14:43:03 +03:00
Mikko Pitkänen
3dd606ce2e
Add new threadwrapper
2019-06-18 18:45:45 +03:00
Sami Ahovainio
2c78aa0642
Fixes to coeff coding.
2019-06-13 12:01:29 +03:00
Joose Sainio
c94077d15e
remove hardcoded value
2019-06-12 14:37:41 +03:00
Joose Sainio
ac68c8444d
remove negation that wasn't supposed to be there
2019-06-12 14:35:24 +03:00
Joose Sainio
5851dcc3be
missing negation
2019-06-12 14:08:18 +03:00
Joose Sainio
1c36f68d0c
Fix owf>=9 gop=8 and add test to catch such problem in future
2019-06-12 14:04:41 +03:00
Sami Ahovainio
3564b4829e
Fixed split context decision. Modified intra mode initialization to match VTM version aa76fc5c04cf43390f43d63f9977bea8ee31997a.
2019-06-12 12:59:16 +03:00
Sami Ahovainio
a8a53e15b5
Fixed headers to match VTM commit aa76fc5c04cf43390f43d63f9977bea8ee31997a. Added multi_ref_line flag coding.
2019-06-07 13:37:45 +03:00
Ari Lemmetti
933ff6ed55
Merge branch 'set-qp-in-cu-fix'
2019-06-07 09:01:03 +03:00
Sami Ahovainio
8d2581e58c
Fixed issue with kvz_go_rice_par_abs where passing a unsigned argument caused MIN function to return wrong value. Modified coefficient coding to match VTM 5.0. Some issues still remain.
2019-06-05 15:57:18 +03:00
Sami Ahovainio
367f1b2129
Fixed splitting bug caused by wrong values in the headers. Fixed header flags to match VTM commit 5703e81b2de677d976ec15423f5768b17619ba6a
2019-06-05 11:21:02 +03:00
Sami Ahovainio
76d56290ed
Fixed VUI header writing. Fixed debug prints of NAL headers and rbsp_stop_one_bit.
2019-05-31 11:13:11 +03:00
Ari Lemmetti
c6da839002
Set lcu sqrt lambda according to lcu lambda instead of frame lambda when ROI is used
2019-05-29 18:32:10 +03:00
Marko Viitanen
8282a18c36
Fixed headers and NAL writing to match the latest VTM master 988c22cbb9c58584cac3ef0ec7794cafbea6dfd6
2019-05-29 16:18:35 +03:00
Sami Ahovainio
4768ba0628
Minor fixes to header writing. Added contexts for multi_ref_line and BDPCM. Functions added for writing both in bitstream, but they are both disabled for now.
2019-05-29 13:00:19 +03:00
Sami Ahovainio
3339e12169
Fixed some header flags
2019-05-27 09:56:56 +03:00
Ari Lemmetti
9339845e8b
Set QP completely at CU level as the name '--set-qp-in-cu' implies
...
-Move slice delta QP to CU level when using --set-qp-in-cu
-Separate functionality from roi
2019-05-24 20:38:39 +03:00
Pauli Oikkonen
081d16fc33
Fix intrinsics that may be missing on some systems
...
Create a header to collect all the workarounds for missing intrinsics
in one place
2019-05-23 19:59:40 +03:00
Sami Ahovainio
5b46fbd878
Added multi_ref_idx variable for intra coding (is 0 throughout the code for now). Modified prediction flag writing. Chroma pred flag remains unchanged (ToDo). Added bitstream debug printing on VERBOSE mode.
2019-05-21 12:28:05 +03:00
Sami Ahovainio
ed4e218702
Updated coefficient coding to match VTM 5.0
2019-05-13 15:30:43 +03:00
Sami Ahovainio
504c3dfd1b
Modified the headers to match current VTM headers
2019-05-07 16:30:06 +03:00
Marko Viitanen
30a8a7b97c
WIP fixing the last significant xy coding
2019-05-07 15:01:02 +03:00
Pauli Oikkonen
87a9208db8
Eliminate cvtsi64_si128 intrinsic
...
Apparently it'll cause Win32 builds to break because it emits the movq
instruction or something..
2019-04-17 16:30:40 +03:00
Pauli Oikkonen
7175d20bb2
Still include stdint.h for non-vector builds
2019-04-15 19:36:01 +03:00
Pauli Oikkonen
1315c7e2b0
Do not compile any vector code for non-SSE4/AVX2 builds
2019-04-15 19:10:48 +03:00
Pauli Oikkonen
f5f70e7bc5
Merge branch 'sad-optimization'
2019-04-15 19:02:01 +03:00
Jan Beich
85f46e17a9
Detect AltiVec via elf_aux_info() on FreeBSD 12+
2019-04-01 13:08:04 +00:00
Jan Beich
82486255da
Simplify AltiVec detection on Linux
2019-04-01 13:08:04 +00:00
Marko Viitanen
1546acfdb9
New NAL unit IDs and header changes
2019-03-28 10:11:36 +02:00
Marko Viitanen
36eab9c170
New cabac context models with "rate"
2019-03-27 12:38:19 +02:00
Marko Viitanen
3bdc8ac8d3
Fix intra_chroma_pred_mode and cbf contexts
2019-03-26 09:10:09 +02:00
Marko Viitanen
d15f58517f
Changed intra coding to use 6 MPM, implemented merge sort and MPM selection
2019-03-20 15:20:31 +02:00
Marko Viitanen
1081336868
Updated intra pred mode init values
2019-03-20 15:18:32 +02:00
Marko Viitanen
f3acd245ae
New cabac coding function: kvz_cabac_encode_trunc_bin
2019-03-20 15:17:54 +02:00
Marko Viitanen
80d6e4bf05
New split flag calculations
2019-03-20 09:07:58 +02:00
Marko Viitanen
8c84348010
New entropy bit table
2019-03-20 09:07:22 +02:00
Marko Viitanen
2d0348aa6d
New context models
2019-03-20 09:06:57 +02:00
Marko Viitanen
052080747e
New CABAC functions
2019-03-20 09:06:26 +02:00
Marko Viitanen
20667fdba6
Update header bits to VTM 4.0+
2019-03-11 14:02:12 +02:00
Pauli Oikkonen
6d43759604
Create a border-respecting 32-wide AVX hor_sad
2019-03-07 18:01:22 +02:00
Pauli Oikkonen
f218cecb38
Remove offending hor_sad_avx2_w32 function
...
Consider possibly creating a non-offending AVX2 version instead, the
way hor_sad_sse41_w32 works. Or maybe there's more essential work to
do.
2019-03-05 22:51:41 +02:00
Pauli Oikkonen
df2e6c54fd
4-unroll hor_sad_sse41_arbitrary
...
This may not increase perf though because it's so rarely used
function, so keeping icache footprint may be more essential...
2019-03-05 22:45:23 +02:00
Pauli Oikkonen
448eacba7b
Avoid overreading block borders in hor_sad_sse41_arbitrary
2019-03-05 22:34:50 +02:00
Eemeli Kallio
c159e275b7
Merge branch 'max_merge'
2019-03-05 14:39:03 +02:00
Pauli Oikkonen
41f51c08c4
Avoid overrunning buffer in hor_sad_sse41_w32
2019-03-01 15:37:38 +02:00
Pauli Oikkonen
bcd9879359
Include quant coeff range check in non-scaling list execution path too
2019-02-27 17:26:44 +02:00
Pauli Oikkonen
24e6363f64
Remove the kvz_quant_avx2 wrapper function
2019-02-27 16:32:58 +02:00
Pauli Oikkonen
748820f3c5
Eliminate unnecessary loading of coeffs if scaling lists are off
2019-02-27 16:26:35 +02:00
Pauli Oikkonen
5994350f40
Allow quant_flat_avx2 to be used with scaling lists on
2019-02-27 16:25:59 +02:00
Eemeli Kallio
7f4e0acf41
Added check if max-merge is out of bounds
2019-02-19 13:53:42 +02:00
Pauli Oikkonen
9b0e079262
Use SSE instructions for 64-bit SADs instead of MMX
...
VC++ seems to choke on MMX instructions
2019-02-18 20:13:33 +02:00
Pauli Oikkonen
d8b8923028
Add LGPL notices to reg_sad headers
2019-02-18 17:52:47 +02:00
Eemeli Kallio
2a40560888
some variables to const
2019-02-12 11:24:10 +02:00
Eemeli Kallio
8f8e7bb53c
Added possibility to reduce number of maximum number of merge candidates.
2019-02-12 09:21:03 +02:00
Marko Viitanen
1165219842
Update PTL, SPS ext and SPS flags to match VTM 4rc1
2019-02-07 10:00:04 +02:00
Pauli Oikkonen
770db825b9
Create hor_sad_w8 and w4 epol mask the way w16 works
2019-02-06 19:34:26 +02:00
Pauli Oikkonen
aa19bcac8a
Avoid branching in creating shuffle mask in hor_sad_w16
2019-02-06 18:58:46 +02:00
Pauli Oikkonen
2d05ca8520
Remove width from constant-width hor_sad func params
...
They should kinda know it already
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
57db234d95
Move 32-wide SSE4.1 hor_sad to picture-sse41.c
...
It's not used by picture-avx2.c that also includes the header, so
it should not be in the header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
dd7d989a39
Implement 32-wide hor_sad on AVX2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ff70c8a5ec
Utilize horizontal SAD functions for SSE4.1 as well
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
f5ff4db01f
4-wide hor_sad border agnostic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
35e7f9a700
Fix hor_sad w8 to work with both borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
836783dd6e
Use hor_sad_w32 for both left and right borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
69687c8d24
Modify hor_sad_sse41_w16 to work over left and right borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
51c2abe99a
Modify image_interpolated_sad to use kvz_hor_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
1e0eb1af30
Add generic strategy for hor_sad'ing an non-split width block
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
686fb2c957
Unroll arbitrary-width SSE4.1 hor_sad by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
768203a2de
First version of arbitrary-width SSE4.1 hor_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ccf683b9b6
Start work on left and right border aware hor_sad
...
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
760bd0397d
Pad the image buffer by 64 bytes from both ends
...
This will be necessary for an efficient and straightforward
implementation of hor_sad for blocks over 16 pixels wide, because they
cannot use the shuffle trick because inter-lane shuffling is so hard to
do
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
c36482a11a
Fix bug in 24-wide SAD
...
*facepalm*
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
f781dc31f0
Create strategy for ver_sad
...
Easy to vectorize
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ca94ae9529
Handle extrapolated blocks with unmodified width using optimized_sad pointer
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91b30c7064
Tidy up kvz_image_calc_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
9db0a1bcda
Create get_optimized_sad func for SSE4.1
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91380729b1
Add generic get_optimized_sad implementation
...
NOTE: To force generic SAD implementation on devices supporting
vectorized variants, you now have to override both get_optimized_sad
and reg_sad to generic (only overriding get_optimized_sad on AVX2
hardware would just run all SAD blocks through reg_sad_avx2). Let's
see if there's a more sensible way to do it, but it's not trivial.
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
45f36645a6
Move choosing of tailored SAD function higher up the calling chain
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91cb0fbd45
Create strategy for directly obtaining pointer to constant-width SAD function
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
94035be342
Unify unrolling naming conventions
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
517a4338f6
Unroll SSE SAD for 8-wide blocks to process 4 lines at once
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
0f665b28f6
Unroll arbitrary width SSE4.1 SAD by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
cbca3347b5
Unroll 64-wide AVX2 SAD by 2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
84cf771dea
Unroll 32 and 16 wide SAD vector implementations by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
5df5c5f8a4
Cast all pointers to const types in vector SAD funcs
...
Also tidy up the pointer arithmetic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
a711ce3df5
Inline fixed width vectorized SAD functions
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
6504145cce
Remove 16-pixel wide AVX2 SAD implementation
...
At least on Skylake, it's noticeably slower than the very simple
version using SSE4.1
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
4cb371184b
Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
796568d9cc
Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
4d45d828fa
Use constant-width SSE4.1 SAD funcs for AVX2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
2eaa7bc9d2
Move SSE4.1 SAD functions to separate header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
d2db0086e1
Create constant width SAD versions for 8 and 16 pixels
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
a13fc51003
Include a blank AVX2 strategy registration function even in non-AVX2 builds
2019-02-04 19:52:24 +02:00
Pauli Oikkonen
d55414db66
Only build AVX2 coeff encoding when supported
...
..whoops
2019-02-04 19:34:30 +02:00
Pauli Oikkonen
3fe2f29456
Merge branch 'encode-coeffs-avx2'
2019-02-04 18:52:31 +02:00
Pauli Oikkonen
722b738888
Fix more naming issues
2019-02-04 16:05:43 +02:00
Pauli Oikkonen
e26d98fb75
Rename a couple variables and add crucial comments
2019-02-04 15:57:07 +02:00
Pauli Oikkonen
f186455619
Move encode_last_significant_xy out of strategy modules
...
It's the exact same in both AVX2 and generic, and does not seem to
be worth even trying to vectorize
2019-02-04 14:55:41 +02:00
Pauli Oikkonen
3f7340c932
Fine-tune pack_16x16b_to_16x2b
...
Avoid mm_set1 operation when it's possible to create the constant with
one bit-shift operation from another instead. Thanks Intel for
3-operand instruction encoding!
2019-02-04 14:44:47 +02:00
Pauli Oikkonen
314f5b0e1f
Rename 16x2b cmpgt function, comment it better, optimize it slightly
...
Eliminate an unnecessary bit masking to make it even more messy
2019-02-04 14:44:32 +02:00
Pauli Oikkonen
d8ff6a6459
Fix _andn_u32 to work on old Visual Studio
2019-02-01 15:34:42 +02:00
Pauli Oikkonen
26e1b2c783
Use (u)int32_t instead of (unsigned) int in reg_sad_sse41
2019-01-10 14:37:04 +02:00
Pauli Oikkonen
3a1f2eb752
Prefer SSE4.1 implementation of SAD over AVX2
...
It seems that the 128-bit wide version consistently outperforms the
256-bit one
2019-01-10 13:48:55 +02:00
Pauli Oikkonen
9b24d81c6a
Use SSE instead of AVX for small widths
...
Highly dubious if this will help performance at all
2019-01-07 20:12:13 +02:00
Pauli Oikkonen
b2176bf72a
Optimize SSE4.1 version of SAD
...
Make it use the same vblend trick as AVX2. Interestingly, on my test
setup this seems to be faster than the same code using 256-bit AVX
vectors.
2019-01-07 19:40:57 +02:00
Pauli Oikkonen
887d7700a8
Modify AVX2 SAD to mask data by byte granularity in AVX registers
...
Avoids using any SAD calculations narrower than 256 bits, and
simplifies the code. Also improves execution speed
2019-01-07 18:53:15 +02:00
Pauli Oikkonen
7585f79a71
AVX2-ize SAD calculation
...
Performance is no better than SSE though
2019-01-07 16:26:24 +02:00
Pauli Oikkonen
ab3dc58df6
Copy SAD SSE4.1 impl to AVX2
2019-01-03 18:31:57 +02:00
Pauli Oikkonen
45ac6e6d03
Tidy pack_16x16b_to_16x2b comments
2019-01-03 16:37:05 +02:00
Ari Lemmetti
cd818db724
Add missing quantization and residual in cost calculation (inter rd=2).
2018-12-21 15:55:29 +02:00
Pauli Oikkonen
016eb014ad
Move packing 16x16b -> 16x2b into separate function
2018-12-20 10:51:44 +02:00
Ari Lemmetti
b234897e8a
Fix smp and amp blocks in fme and revert previous change.
...
Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc.
Calculate SATD on the 8x4, ... part
2018-12-19 21:30:53 +02:00
Pauli Oikkonen
9aaa6f260d
Fixes to enable portability
2018-12-18 20:42:09 +02:00
Pauli Oikkonen
2fdbbe9730
Move CG reordering code from quant-avx2 to shared header
2018-12-18 19:42:18 +02:00
Pauli Oikkonen
d02207306d
Create a header file for shared AVX2 code
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
361bf0c7db
Precompute >=2 coeff encoding loop with 2-bit arithmetic
...
Who needs 16x16b vectors when you can do practically the same with
16x2b pseudovectors in 32-bit general purpose registers!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
940b0e9e6a
Require BMI2 for AVX2 build
...
Any processor implementing AVX2 should also implement BMI2
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
f66cb23d5b
Optimize greater1 encoding loop
...
Calculating the c1 variable need not be a serial operation!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
8c8b791c35
Vectorize kvz_context_get_sig_ctx_inc
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
033261eb74
Eliminate two branches using bit magic
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
c4434e8d04
Scan CG's in forward order to simplify finding last significant
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
efd097f5a5
Vectorize the coeff group loop to some extent
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
a01362e638
use the efficient method of reordering raster->scan
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
50a888e789
Use the efficient method to find first and last nz coeffs in block
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
7e9203f566
Scan coeff groups in scan order to help find last significant one
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
9a5a6fdbc7
Simplify two ifs in encode_coeff_nxn-avx2
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
37a2a8bac8
See if loop can be optimized by rearranging
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
584f2f74b6
Vectorize significant coeff group scanning loop
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
1bfed73221
Add AVX2 strategy for encode_coding_tree
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
c3a6f3112a
Add generic strategy group for encode_coding_tree
2018-12-18 19:41:09 +02:00
Marko Viitanen
1ef851ab4b
Disable FME on amp/smp blocks with width or height not divisible by 8
2018-12-18 10:28:21 +02:00
Joose Sainio
b71c5573f0
Merge branch 'rate_control_fix'
2018-12-17 12:39:27 +02:00
Sergei Trofimovich
68a70e45a1
x86 asm: mark stack as non-executable
...
Gentoo's `scanelf` QA tool detects writable/executable stack
of assembly-writtent files as:
```
$ scanelf -qRa .
0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o
0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o
0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-sad.o
0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-satd.o
```
Normally C compiler emits non-executable stack marking (or GNU assembler
via `-Wa,--noexecstack`).
The change adds non-executable stack marking for yasm-based assmbly files.
https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2018-12-16 11:31:56 +00:00
Reima Hyvönen
1fcc5c6a8d
Merge branch 'bipred_recon'
2018-12-11 09:59:35 +02:00
Reima Hyvönen
e4a10880f3
Added case 12 to bipred_recon no mov
2018-12-11 09:52:17 +02:00
Marko Viitanen
a4f3968e52
Fix Visual Studio errors by initializing some variables used in AVX2 signhiding
2018-12-11 09:33:26 +02:00
Ari Lemmetti
ac943147e3
Calculate satd cost for whole non-square blocks as well.
2018-12-10 17:04:29 +02:00
Pauli Oikkonen
c465578048
Add a descriptive comment to coefficient reordering
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
f78bf2ebcb
Optimize q_coefs usage for indexed fetch
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
d9591f1b49
Eliminate midway buffering of reordered coefs
...
TODO: For some mysterious reason seems slightly slower than the
buffered one
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
7fe454c51f
Optimize get_cheapest_alternative()
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
6bbd3e5a44
Optimize rearrange_512 function
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
cb8209d1b3
Vectorize transform coefficient reordering loop
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
7cf4c7ae5f
Rename "reduce" functions to hsum
...
That's what the functions fundamendally do anyway
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
316cd8a846
Fix ALIGNED keyword and grow alignment to 64B
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
1befc69a4c
Implement sign bit hiding in AVX2
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
c5cd03497e
Require BMI and ABM instruction sets for AVX2 build
...
AVX2 support on a processor should always imply BMI and ABM support.
The lzcnt and tzcnt instructions have more suitable semantics in the
corner case that source word is 0, and allow us to even handle that
scenario without a branch. Apparently Visual Studio will already
include this support when building with AVX2 enabled, so only the
automake files need to be tweaked.
2018-12-03 15:36:32 +02:00
Reima Hyvönen
f8696b54a4
Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)
2018-11-20 17:09:19 +02:00
Marko Viitanen
a5a10a33c3
Enable --scaling-list parameter and add to the documentation
2018-11-19 10:47:30 +02:00
Reima Hyvönen
710ba288db
Chroma has some problems
2018-11-15 16:42:48 +02:00
Sami Ahovainio
8f98d4aac7
Added square search
2018-11-14 14:50:31 +02:00
Marko Viitanen
6871490dd5
Simplify get_mvd_coding_cost(), only include golomb coding
2018-11-14 14:33:31 +02:00
Ari Lemmetti
a832206bb6
Replace 32-bit incompatible instrinsics
2018-11-12 18:54:33 +02:00
Ari Lemmetti
5c774c4105
Rewrite most of FME and interpolation filters
...
Changes had to break a lot of stuff and were just squashed into this horrible code dump
2018-11-08 20:21:16 +02:00
Joose Sainio
1c8a1f24e2
Don't assume anything about bits spent
2018-11-07 16:03:38 +02:00
Joose Sainio
3471e2470d
Fix using uninitialized value for the first frame
2018-11-07 08:17:39 +02:00
Joose Sainio
d95ac11a3b
Fix rate_control for other LP-GOPS
2018-11-06 14:20:44 +02:00
Joose Sainio
67a6ba667e
Fix rate control for flat lp-gop
2018-11-06 09:38:17 +02:00
Reima Hyvönen
7406c33a42
Some more cleaning
2018-10-26 12:25:18 +03:00
Reima Hyvönen
4c71546b2e
Cleaned some coding
2018-10-26 12:19:44 +03:00
Reima Hyvönen
4fe3909e48
Switched luma to use 32bits size ints intstead of 16bit size
2018-10-24 18:24:46 +03:00
Marko Viitanen
465bc2cfee
[EMT] make functions static and prefix arrays with kvz_g
2018-10-18 10:54:33 +03:00
Marko Viitanen
b133e7de1e
VTM 2.2 changed -> remove high_precision_motion_vectors flag
2018-10-17 12:41:14 +03:00
Marko Viitanen
169febd1c4
[EMT] Simplify DCT8, DCT5, DST1 and DST7 definitions
2018-10-17 12:17:54 +03:00
Marko Viitanen
e015d7eb2b
Fix compiler warnings
2018-10-17 10:43:11 +03:00
Marko Viitanen
ad310c77d3
Added EMT transforms to the strategies
2018-10-17 08:56:49 +03:00
Eemeli Kallio
284e73839e
Calculating zero cost moved to its own function
2018-10-16 11:02:01 +03:00
Reima Hyvönen
381e786e10
Trying to find the bug in luma
2018-10-11 18:08:41 +03:00
Marko Viitanen
c589e5ed36
Fix closed-gop frame feed, the ordering was incorrect after the first GOP
2018-10-10 11:12:03 +03:00
Reima Hyvönen
2f5f81bac3
removed the non-optimated bipred function
2018-10-09 11:19:23 +03:00
Marko Viitanen
75dce4f3ce
Fix low-delay-gop usage with --no-open-gop
2018-10-04 15:16:02 +03:00
Marko Viitanen
de71b58f76
Change closed GOP structure to include an additional IDR between GOPs
2018-10-04 11:17:03 +03:00
Marko Viitanen
1e1a80e4a6
[TMVP] fix clamping of block offsets and clean up the code a bit
2018-10-03 12:34:48 +03:00
Reima Hyvönen
212a8e68fa
Modified to avoid memory overflow, still some bug inside luma
2018-10-02 20:23:32 +03:00
Marko Viitanen
954f07e3d7
Add --(no-)open-gop option
2018-10-02 10:05:32 +03:00
Marko Viitanen
027359c3c3
Implement TMVP duplicate checking as in VTM 2.1
2018-09-28 11:50:36 +03:00
Marko Viitanen
571a545416
Fix spatial merge candidate selection
2018-09-26 15:10:31 +03:00
Marko Viitanen
63760ca0cf
Use kvz_cabac_bins_verbose flag to control cabac debug printing
2018-09-26 12:01:23 +03:00
Marko Viitanen
7c37f456f9
Fix implicit Qt split for p-frames
2018-09-26 12:00:18 +03:00
Marko Viitanen
b6f2c66c73
Fixed intra Most Probable Mode (mpm) derivation to conform VTM 2.1
2018-09-21 10:33:54 +03:00
Sami Ahovainio
a2b2275d87
Fixed array sizes in search_intra_rough from 35 to 67
2018-09-18 11:49:15 +03:00
Sami Ahovainio
82fb80ab6e
Fixed couple of if-clauses which still used the old intra mode range.
2018-09-17 08:56:43 +03:00
Marko Viitanen
a437d4c508
Fixed intra chroma mode bitstream writing (chroma search not used)
2018-09-13 15:05:00 +03:00
Marko Viitanen
389aeebe07
Added 2x2 transform functions
2018-09-13 14:51:07 +03:00
Marko Viitanen
445c059b4a
Fix transforms for VTM 2.0, generated new transform matrices and added a shift by 2 for forward and inverse
2018-09-13 14:39:49 +03:00
Marko Viitanen
35fa8e9785
Fix kvz_intra_get_dir_luma_predictor -> Intra working
2018-09-13 12:32:17 +03:00
Marko Viitanen
f75b0b11c3
Simplify intra filtered ref pixel selection
2018-09-13 10:09:52 +03:00
Sami Ahovainio
4bb484a86a
Fixed if-clause at search_intra.c to use new wider range of intra modes
2018-09-13 09:58:48 +03:00
Marko Viitanen
82de0fbee7
Switch intra search to use the actual 67 modes
2018-09-13 09:43:45 +03:00
Marko Viitanen
382917bcd3
New table for choosing angular intra filtered references and a small bugfix on the end condition of angular intra
2018-09-13 09:35:55 +03:00
Marko Viitanen
4aad2fa383
Fix intra mode writing
2018-09-12 10:34:58 +03:00
Marko Viitanen
d4ed0ee3ad
Fixed some array offsets in intra angular prediction
2018-09-12 08:53:17 +03:00
Marko Viitanen
20c96366ed
fix kvz_context_get_sig_ctx_idx_abs() parameter for "type" -> decoding with VVC
2018-09-10 12:51:02 +03:00
Marko Viitanen
a7ca09108c
Improve CABAC debugging by including similar info as in VTM
2018-09-10 11:00:00 +03:00
Sami Ahovainio
ce84407c69
Fixed coeff_remain writing to use the correct rice_param instead of using 0 all the time.
2018-09-07 11:24:24 +03:00
Sami Ahovainio
78ea24bcf1
Fixed sig_coeff_flag writing condition.
2018-09-06 15:48:45 +03:00
Marko Viitanen
4bebb4bb2c
Fix temp_diag and temp_sum initialization and coeff array usage in context derivation
2018-09-05 17:09:50 +03:00
Marko Viitanen
f5b6c386bc
Fix incorrect sig_flag implicity parameters and some temp variable initializations
2018-09-03 16:22:05 +03:00
Marko Viitanen
8bef85e056
Merge branch 'set-qp-in-cu'
2018-09-03 08:33:33 +03:00
Ari Lemmetti
2fdcc2b79d
Add option --set-qp-in-cu
2018-09-03 08:32:45 +03:00
Marko Viitanen
52be2f0bbe
Fixed kvz_encode_coeff_nxn and renamed some variables to match VTM
2018-08-31 15:10:17 +03:00
Sami Ahovainio
787264f568
Fixed dst indexing in kvz_angular_pred_generic
2018-08-31 10:36:28 +03:00
Sami Ahovainio
d2291fea83
Intra mode scaling moved from angular prediction to kvz_intra_predict. pdpc implemented in kvz_intra_predict.
2018-08-31 10:01:28 +03:00
Marko Viitanen
49a116ed3a
Bugfix correct array sizes for cu_ctx_last_x/y
2018-08-30 16:14:08 +03:00
Sami Ahovainio
84cef127dc
Fixed cu_gtx_flag_model_chroma initialization.
2018-08-30 15:21:16 +03:00
Marko Viitanen
7d491e639b
Add new values to last_x/y coding
2018-08-30 15:04:04 +03:00
Marko Viitanen
809805b185
Bugfixes for kvz_encode_coeff_nxn()
2018-08-30 14:50:29 +03:00
Marko Viitanen
0680f240d7
Converted kvz_encode_coeff_nxn and related helper functions to VVC K0072 format
2018-08-30 14:24:03 +03:00
Marko Viitanen
84e78c6c50
Disable writing of cabac flags not currently available
2018-08-30 11:21:44 +03:00
Marko Viitanen
e3dbaf99a9
Started implementing new coeff coding function
...
- added kvz_context_get_sig_ctx_idx_abs for abs sig context derivation
2018-08-30 11:09:42 +03:00
Marko Viitanen
e00319b832
Fix cu_sig_coeff_group_model init and some instances of cu_sig_model usage
2018-08-30 09:08:08 +03:00
Marko Viitanen
4429e0b89d
Expand cu_sig_coeff_group_model according to VVC
2018-08-29 16:20:34 +03:00
Sami Ahovainio
578122ed43
Context changes for chroma pred modes. BT flag init and chroma pred mode init moved inside a loop.
2018-08-29 16:00:08 +03:00
Sami Ahovainio
54ebadfc43
Clarifying comments and changes towards WAIP
2018-08-29 16:00:08 +03:00
Marko Viitanen
7f119e8bdd
Added new ctx models for sig, parity and gtx, removed models for one and abs
2018-08-29 15:57:40 +03:00
Marko Viitanen
46d02c1734
Implemented JVET-K0072 based cbf context selections
2018-08-29 10:12:07 +03:00
Marko Viitanen
bb9dc22336
Disable PCM
2018-08-29 09:59:53 +03:00
Marko Viitanen
23a1292f52
Added max_binary_tree_unit_size and more comments
2018-08-29 08:23:41 +03:00
Marko Viitanen
37caa451c6
Fix VVC split flag condition for hor and ver splits at the edges
...
- Split flag is no longer implicit when the block can be split with the BT after QT in horizontal or vertical way
2018-08-28 16:03:02 +03:00
Reima Hyvönen
896034b7cf
Some renamed functions back
2018-08-28 15:31:10 +03:00
Reima Hyvönen
e8b5e6db4c
Did some merging
2018-08-28 15:26:27 +03:00
Reima Hyvönen
7de5c74434
Updated bipred_recon to work faster
2018-08-28 15:12:31 +03:00
Reima Hyvönen
47b357cca2
Comment one test
2018-08-27 18:52:14 +03:00
Reima Hyvönen
2ca99a44e8
Updated shuffle operation to be in right order
2018-08-27 18:16:38 +03:00
Sami Ahovainio
42741a2c40
Some changes for PCM and Intra towards VTM 2.0 compatibility.
2018-08-27 09:18:15 +03:00
Marko Viitanen
3dc5f65fba
Add an extra bit to intra mode and map 33 angular modes to 65
2018-08-17 15:09:48 +03:00
Marko Viitanen
9aaf53fcd7
Add dep_quant_enable_flag to slice header
2018-08-17 14:58:57 +03:00
Marko Viitanen
dc92fa6fb3
Added missing ALF flag to SPS
2018-08-17 12:53:27 +03:00
Marko Viitanen
dbc74c592d
Add VTM 2.0 new flags to SPS
2018-08-17 12:47:29 +03:00
Marko Viitanen
17505c8306
Disable vertical and horizontal scan order with small blocks
...
- Intra now working down to 8x8 luma
2018-08-17 11:38:40 +03:00
Marko Viitanen
4f7da86285
Commented out sign hiding code, which is not used in VVC
2018-08-17 09:38:11 +03:00
Marko Viitanen
c9cbdd5dc3
Added couple of ToDo comments for large CTU support
2018-08-17 09:37:14 +03:00
Marko Viitanen
daf041406f
Disable DST
2018-08-16 16:05:32 +03:00
Marko Viitanen
b85ae3688e
Signal QP in slice header if tiles and slices=tiles are enabled
...
Keeps the PPS constant for various purposes
2018-08-16 08:44:39 +03:00
Sami Ahovainio
5baab86597
Added BT split flags
2018-08-14 15:28:06 +03:00
Marko Viitanen
b33aa37484
Enable max_trans_hier_depth values and disable DC and angular filtering
2018-08-14 15:24:21 +03:00
Marko Viitanen
00a827007a
Use normal split flags
2018-08-14 10:57:32 +03:00
Reima Hyvönen
508b218a12
some modifications made to prevent reading too much
2018-08-14 10:50:39 +03:00
Reima Hyvönen
1d935ee888
some useless stuff removed
2018-08-13 16:47:11 +03:00
Reima Hyvönen
ce3ac4c05e
some modifications to no_mov
2018-08-13 16:41:02 +03:00
Reima Hyvönen
15a613ae94
test if no_mov breaks testing
2018-08-13 16:02:56 +03:00
Reima Hyvönen
97a2049e58
removed pointer declaration out from switch
2018-08-10 16:42:26 +03:00
Reima Hyvönen
aa94bcedbc
Stream is now pointer
2018-08-10 16:38:49 +03:00
Reima Hyvönen
fa5b227ece
256 to 32 doesn't work, made them by hand
2018-08-10 16:01:20 +03:00
Reima Hyvönen
408dedbcc8
removed _mm256_extract_epi8 and replaced with _mm_stream
2018-08-10 15:53:26 +03:00
Reima Hyvönen
31c35091c6
_mm256_cvtsi256_si32 removed
2018-08-10 10:06:40 +03:00
Reima Hyvönen
99dc43074f
_mm256_cvtsi256_si32 breaks system, too much bits. back to extract
2018-08-10 09:59:33 +03:00
Reima Hyvönen
4f1f80b2cb
Transformed convert from 256 to cast 256 -> 128 and then convert from 128
2018-08-09 15:35:54 +03:00
Reima Hyvönen
4957555eb3
Removed leftover from 939
2018-08-09 15:25:03 +03:00
Reima Hyvönen
28b165c971
Clearified some sections, added _MM_SHUFFLE macro
2018-08-09 15:23:01 +03:00
Reima Hyvönen
dd04df8667
testing if error in both avx2 functions
2018-08-03 11:49:00 +03:00
Reima Hyvönen
ed50d71fde
Switched some variables to different location, altered inter_recon_bipred_avx2 function
2018-08-02 16:08:59 +03:00
Reima Hyvönen
f5739a0028
Renaming and removing useless prints
2018-08-02 14:47:17 +03:00
Reima Hyvönen
bc09f59bb6
Edited some definitions
2018-08-02 11:54:53 +03:00
Marko Viitanen
ffbc178cf9
An attempt to fix checksums
2018-07-27 14:38:05 +03:00
Marko Viitanen
84b6a61193
Hack to fix split flag model for PCM use -> valid VVC bitstream
2018-07-27 14:29:31 +03:00
Marko Viitanen
90174f1143
Add more values to cabac debugging
2018-07-27 13:59:54 +03:00
Marko Viitanen
c6572d644f
Updated split_flag initialization to support Large CTUs in VVC
2018-07-27 12:32:45 +03:00
Marko Viitanen
7abadaafe4
Disable CTU splitting and configure max CTU sizes to 64x64
2018-07-27 11:04:21 +03:00
Marko Viitanen
6921e31502
Fix debugging functions
2018-07-27 11:03:16 +03:00
Marko Viitanen
37b5ce3d33
Change configurations to ease VVC debugging, max-BT-depth = 0
2018-07-26 16:12:11 +03:00
Marko Viitanen
792da1b7e0
Force PCM coding and fix PCM sample output
2018-07-26 11:05:31 +03:00
Marko Viitanen
5d4a2a004f
Remove depentent slice, wpp/tile and scaling list parameters from PPS
2018-07-26 10:43:21 +03:00
Marko Viitanen
31a6cbfe6d
Disable sign bit hiding
2018-07-26 10:41:35 +03:00
Marko Viitanen
9f2b429c66
Disable some features not used in VVC
...
- Part mode coding not used
- split transform flag not used
- last significant coeff pos swapping not used
2018-07-26 10:33:27 +03:00
Marko Viitanen
e84276f7f6
Fixed version string
2018-07-26 08:17:55 +03:00
Marko Viitanen
e38109d102
Enable QTBT and set correct general_profile_idc for Next
2018-07-25 12:24:17 +03:00
Marko Viitanen
079ca9b8b2
Disable tile/wpp flags in slice header
2018-07-25 11:19:53 +03:00
Marko Viitanen
b0ac7002e5
Disable VPS
2018-07-25 11:02:09 +03:00
Marko Viitanen
c5bf6a3774
Bugfix: add missing parameters to WRITE_U
2018-07-25 10:18:48 +03:00
Marko Viitanen
9befe35961
Modify slice header to conform VVC
2018-07-25 10:17:42 +03:00
Marko Viitanen
95ce1e1a25
Modify parameter sets to conform VVC
2018-07-25 10:05:11 +03:00
Arttu Ylä-Outinen
83555c3d6d
Enable --fast-residual-cost with fastest presets
2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen
c438bb4a19
Add an option to skip CABAC for residual costs
...
Adds command line option --fast-residual-cost=<limit>. When QP is below
the limit, estimates the cost of coding the residual coefficients from
the sum of absolute coefficients. Skipping CABAC is not worth it with
high QPs because there are fewer coefficients so CABAC is not as slow.
2018-07-16 12:31:20 +03:00
Reima Hyvönen
a4bf77f208
Tested some extract functions
2018-07-12 09:29:32 +03:00
Reima Hyvönen
c05033a893
Even more useless vectors removed
2018-07-11 15:09:14 +03:00
Reima Hyvönen
884cb77238
Removed some not used vectors
2018-07-11 15:06:11 +03:00
Reima Hyvönen
792689a5ff
Removed for-loops, added extract instead
2018-07-11 14:56:41 +03:00
Reima Hyvönen
f9c7f6ee66
Added some break-operations for avx2 optimation
2018-07-11 14:15:38 +03:00
Reima Hyvönen
cc064da143
some more optimation for bipred
2018-07-11 11:27:54 +03:00
Reima Hyvönen
9a339eef89
Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD
...
# Conflicts:
# build/kvazaar_lib/kvazaar_lib.vcxproj
2018-07-10 16:21:04 +03:00
Reima Hyvönen
a22cf03ddb
Updated to have no movement function to avx2 strategies
2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen
b7474eb532
Fix SAO buffer sizes
...
Increases sizes of buffers used for SAO reconstruction to avoid stack
buffer overflow in AVX2 SAO reconstruction.
2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen
b37470e80f
Merge pull request #207 from jbeich/maltivec
...
Unbreak build on PowerPC if AltiVec isn't supported
2018-07-04 11:06:41 +03:00
Reima Hyvönen
ea83ae45f0
Toimiva ratkaisu
2018-07-03 11:18:51 +03:00
Jan Beich
4f4bea7496
Check -maltivec is supported before using
...
PowerPC target may lack or have non-standard FPU:
$ cc -dumpmachine
powerpcspe-undermydesk-freebsd
$ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c
src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist
2018-07-02 23:25:23 +00:00
Jan Beich
b892d820f8
Clean up macOS includes on powerpc* after 93e1c9f1c3
...
strategyselector.c:426:25: machine/cpu.h: No such file or directory
2018-07-02 21:52:45 +00:00
Reima Hyvönen
17babfffa4
25.6 working optimation, ~50% faster than original
2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen
2f995f4325
Merge pull request #205 from jbeich/powerpc
...
Unbreak build on non-Linux powerpc*
2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen
c1398ef818
Permit --period=1 with any GOP structure
...
All intra coding is a special case so it can be permitted even though
Kvazaar normally only supports intra periods that are divisible by the
GOP length.
2018-06-18 12:26:11 +03:00
Arttu Ylä-Outinen
abdebe0bf9
Fix --owf help message
...
The number of parallel frames is --owf plus one, not --owf minus one.
Fixes #204 .
2018-06-18 09:33:36 +03:00
Jan Beich
93e1c9f1c3
Add AltiVec detection for BSDs
...
strategyselector.c:377:26: linux/auxvec.h: No such file or directory
2018-06-17 15:38:24 +00:00
Miika Metsoila
98972d26c2
Document that the high tier requires level 4 or higher
2018-06-14 12:41:03 +03:00
Miika Metsoila
62b44efaa4
Write the encoding tier (main/high) into the bitstream
2018-06-14 12:41:03 +03:00
Arttu Ylä-Outinen
a343f6d587
Prepare for delta QPs at CU-level
...
- Replaces lcu_dqp_enabled with max_qp_delta_depth in encoder_control_t.
- Fixes set_cu_qps so that it can handle quantization groups of
arbitrary size.
- Fixes computation of QP predictors so that it works for quantization
groups of arbitrary size.
2018-06-13 15:36:19 +03:00
Arttu Ylä-Outinen
dc6b2024ea
Modify reference count asserts to fix data races
...
Changes asserts on the reference count of objects to assert the value
after KVZ_ATOMIC_INC instead of directly checking the value. Fixes some
data races detected by TSan.
2018-06-12 09:35:07 +03:00
Ari Lemmetti
4fb1c16c61
Add early termination for intra rdo when a zero coefficient block is found.
2018-06-08 21:03:07 +03:00
Ari Lemmetti
492529fb7a
Add the same comment to help message as well...
2018-05-30 14:13:15 +03:00
Ari Lemmetti
0d5972bf03
Add missing sort to intra transform split search so mode at 0 is the best
2018-05-21 13:10:38 +03:00
Sebastien Alaiwan
954bca7d6e
Fix memset parameter
2018-05-17 11:24:49 +02:00
Jaakko Laitinen
f9466efcbb
Close file on error
2018-05-15 11:50:16 +03:00
Reima Hyvönen
9fed29f950
optimation for inter_recon_bipred
2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen
5c585c4fbc
Update help message
...
Updates the default option values to match the medium preset.
2018-04-03 10:40:37 +03:00
Arttu Ylä-Outinen
2b4e22111a
Update presets
...
The new presets are slower but have better coding efficiency.
2018-04-03 10:37:30 +03:00
Arttu Ylä-Outinen
7185519a1b
Update command line help
...
- Adds missing default values.
- Adds help for --crypto and --key.
- Adds help for --rd=3.
- Adds help for --sao options.
- Some changes to help wording.
2018-03-23 14:33:04 +02:00
Arttu Ylä-Outinen
3606860504
Add --no-cpuid option
...
Equivalent to --cpuid=0.
2018-03-23 12:32:27 +02:00
Arttu Ylä-Outinen
fb462b25ef
Fix transform skip for inter
...
The transform skip flag in cu_info_t was stored under the intra
substruct even though transform skip can be used for inter as well. This
caused bitstream errors. Fixed by moving the flag out of the substruct.
2018-03-20 11:01:33 +02:00
Arttu Ylä-Outinen
b64e46707d
Skip raster scan step in TZ search
...
Raster scan is very slow and the BD-rate improvement is marginal.
2018-03-01 14:04:03 +02:00
Arttu Ylä-Outinen
6877064230
Add zero neighborhood check to TZ search
...
Adds an additional grid search step that starts from the zero motion
vector after the normal grid search. The search range for this step is
half of the normal range.
2018-03-01 14:02:13 +02:00
Arttu Ylä-Outinen
74a413c46a
Switch to star refinement in TZ search
2018-03-01 13:06:14 +02:00
Arttu Ylä-Outinen
ebee428ee1
Add loop termination to TZ grid search
...
Terminates the grid search if no better motion vector was found in the
last three iterations.
2018-03-01 13:06:06 +02:00
Arttu Ylä-Outinen
4c175621dd
Fix TZ grid search and star refinement
...
- Changes TZ grid search and star refinement to keep the origin constant
instead of moving to the best position after each iteration.
- Changes star refinement to loop until there is no more improvement,
instead of running the step only once.
2018-03-01 12:56:57 +02:00
Arttu Ylä-Outinen
9c2d0074a2
Add rounding of motion vectors in inter search
...
When the starting point for integer motion estimation was selected among
the merge candidates, the candidate motion vectors were always rounded
down. This commit changes the rounding so that they are rounded to the
nearest integer MV instead.
2018-03-01 09:39:21 +02:00
Ari Lemmetti
662430d441
Select CU type based on SSD, transform unit tree and mode cost of luma and chroma on --rd=2
2018-02-22 19:26:48 +02:00
Arttu Ylä-Outinen
cb06cfeadb
Drop temporary arrays in bipred search
...
Changes bipred search to use the original source and reconstruction
arrays directly instead of copying them.
2018-02-14 11:20:51 +02:00
Arttu Ylä-Outinen
0ea516ba30
Move bipred search to a separate function
2018-02-14 09:56:53 +02:00
Arttu Ylä-Outinen
6f506be12d
Drop dynamic allocation from bipred search
...
Moves the temporary LCU struct used in bipred search from the heap to
the stack. The single malloc call was a huge bottleneck in bipred.
2018-02-14 09:55:02 +02:00
Arttu Ylä-Outinen
7155dd0db7
Add negative references to L1 list
...
Changes reference index list creation so that the negative references
are added to L1 in addition to L0 when biprediction is enabled and no
reordering of pictures is done. Biprediction can now be used with the
low-delay GOP structure.
2018-02-07 14:54:52 +02:00
Arttu Ylä-Outinen
4b24cd03a2
Update for crypto++ 6.0.0 compatibility
...
Changes the crypto module to use unsigned char instead of byte. The byte
typedef is no longer included in the global namespace in crypto++ 6.0.0.
See https://github.com/weidai11/cryptopp/issues/442 .
Fixes #184 .
2018-02-05 13:35:03 +02:00
Arttu Ylä-Outinen
8c53417006
Check zero coefficient cost for inter
...
Checks the cost of flushing all coefficients of an inter block to zero.
This is much faster than doing full RDOQ but can still reduce bitrate
significantly. Encoding speed is increased since fewer coefficient bits
have to be coded with CABAC.
2018-01-29 12:41:56 +02:00
Arttu Ylä-Outinen
018b5ffa64
Move inter CU reconstruction to a new function
...
Moves code for reconstructing all PUs in an inter CU to a new function
kvz_inter_recon_cu in inter.c.
2018-01-24 15:05:39 +02:00
Arttu Ylä-Outinen
405b8c1069
Refactor inter MVD cost functions
...
Moves duplicate code for writing the MVD of a single motion vector from
kvz_get_mvd_coding_cost_cabac and encoder_inter_prediction_unit to a new
function.
2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen
c1cca1ad7f
Refactor inter MV candidate selection
...
Moves duplicate code for checking the best MV candidate from functions
calc_mvd_cost, search_pu_inter_ref and search_pu_inter to a new
function.
2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen
9067aa4535
Remove an unnecessary copy in SMP/AMP search
...
SMP/AMP search is performed using a lower work tree level than the
normal inter search so the prediction info must be copied up if an
SMP/AMP mode is chosen. Previously pixels and coefficient were copied as
well. Changed to only copy prediction info.
2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen
89a930d6dd
Add part mode bitcost when using SMP/AMP blocks
2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen
fc43643ba5
Use a transform split for SMP and AMP blocks
2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen
c74ede148b
Fix CBF flags for 4x4 luma blocks
...
CBF flags were not being propagated to the upper level from blocks of
size 4x4.
2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen
0a69e6d18f
Fix selection of transform function for 4x4 blocks
...
DST function was returned for inter luma transform blocks of size 4x4
even though they must use DCT. Fixed by checking the prediction mode of
the block in addition to whether it is chroma or luma.
2018-01-18 10:36:25 +02:00
Miika Metsoila
bcedfd6669
Remove the usage of errno in me-steps argument parsing
2018-01-16 14:38:43 +02:00
Miika Metsoila
39ed36830e
Merge branch 'me_steps'
2018-01-16 14:22:59 +02:00
Miika Metsoila
61213e3ad9
Improve step parameter parsing and usage
2018-01-10 15:16:52 +02:00
Arttu Ylä-Outinen
649113a821
Fix inter search being used for 4x4 blocks
...
When 4x4 intra blocks are enabled and inter search is limited to 16x16
and larger blocks, it is possible that inter search is accidentally done
for 4x4 blocks. Fixed by checking that block size is at least 8x8 before
doing inter search.
2018-01-10 14:21:48 +02:00
Miika Metsoila
e8e0e7596a
Add a step-cutoff parameter for motion estimation search
2017-12-22 14:04:25 +02:00
Miika Metsoila
4e13608b01
Merge branch 'diamond_search'
2017-12-18 14:11:53 +02:00
Miika Metsoila
2cde0d1a18
Document diamond search option
2017-12-12 14:45:01 +02:00
Miika Metsoila
b923b63b42
Add diamond search
2017-12-12 14:40:14 +02:00
Ari Lemmetti
14892fda00
Replace simple coefficient cost estimation with CABAC. Substantial improvement.
...
Approximation proved to be too inaccurate while not giving actually that much speedup.
2017-12-10 01:23:48 +02:00
Miika Metsoila
ea79069dc8
Fix a type warning in encmain.c
2017-12-08 16:22:40 +02:00
Miika Metsoila
6aa4cd7528
Fix type warnings
2017-12-08 16:16:36 +02:00
Miika Metsoila
b3486b5114
Fix gcc/clang warnings and errors in cfg.c
2017-12-08 16:09:00 +02:00
Miika Metsoila
bac07457ea
Merge branch 'hevc_level'
2017-12-08 15:57:38 +02:00
Miika Metsoila
c67a24e6ec
Update readme and --help text
2017-12-07 12:32:46 +02:00
Ari Lemmetti
713e694d82
Define HAVE_STRUCT_TIMESPEC on Visual Studio 2015 and later
...
Fixes redefinition of timespec that Pthreads-Win32 does even if it has been already defined.
2017-12-05 18:26:12 +02:00
Miika Metsoila
f64d42169f
Improve bitrate checking to accommodate non-integer and less than 1 framerates
2017-12-01 17:20:12 +02:00
Miika Metsoila
57cf92d35f
Implement level's bitrate limit checking during encoding
2017-11-28 16:19:44 +02:00
Miika Metsoila
021fb27787
Add high-tier flag
2017-11-20 16:05:28 +02:00
Miika Metsoila
d249059d61
Minor refactoring of level checking
2017-11-20 13:25:26 +02:00
Arttu Ylä-Outinen
cf85d52b9d
Kvazaar version 1.2.0
2017-11-17 15:23:33 +02:00
Miika Metsoila
4c1512e8c5
Add a check for maximum picture width and height for the given level
2017-11-15 16:39:59 +02:00
Arttu Ylä-Outinen
4cb054295a
Fix linkers
...
Overrides the linkers used for kvazaar, libkvazaar.la and kvazaar_tests.
When crypto++ is enabled, the C++ linker is used and when it is
disabled, the C linker is used.
This removes the need to explicitly specify -lstdc++ in configure when
crypto++ is used and fixes the build with crypto++ when libstd++ is not
installed.
2017-11-13 15:09:38 +02:00
Miika Metsoila
f9a4aba867
Update documentation, fix input fps default value, remove 0 as default level
2017-11-09 16:53:31 +02:00
Miika Metsoila
ebba0a4f01
Test if input conforms to it's level's limits (excluding bitrate)
2017-11-08 16:15:41 +02:00
Miika Metsoila
fb4d0c3cf2
Move level argument parsing to the correct place and give it initial values
2017-11-03 15:47:35 +02:00
Miika Metsoila
61a31054e1
Add level command-line parameter
2017-11-03 13:04:05 +02:00
Arttu Ylä-Outinen
9974380cdd
Fix bipred and temporal MVP
...
- Fixes two errors in calculating the POC for the reference frame for
temporal candidate MV scaling.
- Fixes using the MV for the wrong direction when the temporal MV
predictor block uses bi-prediction.
Fixes #160 .
2017-10-25 12:26:41 +03:00
Arttu Ylä-Outinen
841597e123
Fix picture and slice types
...
Changes handling of intra pictures for --gop=8 so that every picture
with POC divisible by the intra period is intra. The first picture is
IDR and the rest of the intra pictures are CRA. POC is not reset at CRA
pictures. The leading pictures that follow the CRA picture are changed
to RASL so they are allowed to refer to pictures before the CRA picture.
Changes inter slice types to P when the L1 reference list is empty and
to B otherwise.
In all-intra, all pictures are now IDR pictures with POC zero.
2017-10-20 13:35:26 +03:00
Jaakko Laitinen
957b6850c3
Change ref list printout to match hm decoded printout
2017-09-25 13:48:56 +03:00
Arttu Ylä-Outinen
20aea8df63
Fix POCs when using --gop=8
...
When using --gop=8 with an intra period greater than one, a single POC
would be skipped before every intra frame. This commit fixes the problem
by turning the intra frames into BLA frames with leading pictures when
using --gop=8.
2017-09-19 09:31:58 +03:00
Miika Metsoila
6e00f63469
Remove unused variables from search_pu_inter_ref function
2017-09-18 15:36:37 +03:00
Miika Metsoila
7b0101ce3d
Merge branch 'reflist_changes'
...
# Conflicts:
# src/encoderstate.c
# src/search_inter.c
2017-09-18 14:59:37 +03:00
Miika Metsoila
769b17768d
Change max function to MAX macro for clang/gcc compatibility.
...
Remove couple of unnecessary comments
2017-09-15 14:21:51 +03:00
Miika Metsoila
5f7c5443a3
Remove inter.poc
2017-09-12 14:23:19 +03:00
Miika Metsoila
6bd78a3da7
Reverse L0 list sort direction
2017-09-12 14:23:18 +03:00
Miika Metsoila
83dc7e7f50
Made L0 to sort and fixed mv_ref_coded in search_pu_inter
2017-09-12 14:23:18 +03:00
Timothe FRIGNAC
d3362a238e
changed strtod to strtol
2017-08-31 15:14:31 +02:00
Timothe FRIGNAC
3a1ab54ff0
Fixed memory leaks
2017-08-31 11:51:41 +02:00
Timothe FRIGNAC
466297fd77
Fixed build error
2017-08-29 17:01:18 +02:00
Timothe FRIGNAC
2e130912cb
Add --key opt
2017-08-28 17:15:13 +02:00
Miika Metsoila
a5f4cf09b5
Switched from storing POCs in inter.poc to state->frame->refLXs array
2017-08-21 16:34:57 +03:00
Arttu Ylä-Outinen
409d2114f0
Fix motion vector constraints
...
Fixes integer motion vectors being constrained more than what was
necessary when using --mv-constraint or --wpp.
2017-08-11 14:41:36 +03:00
Arttu Ylä-Outinen
7144a00beb
Rewrite thread queue
...
Changes thread queue so that only the jobs that are ready to run are
stored in the queue. Other jobs are kept track of by pointers in the
reverse dependency lists of other jobs. When a job is ready to run it is
appended to the queue. The job queue is stored as a linked list.
The definitions of threadqueue_queue_t and threadqueue_job_t are moved
to the .c file, turning them into opaque structs.
Makes thread queue code simpler. Fixes some TSan errors.
2017-08-11 14:18:12 +03:00
Arttu Ylä-Outinen
bc47fe94af
Drop thread queue debug code
2017-08-11 14:18:12 +03:00
Eemeli Kallio
e5cbc7a205
--sao now enables full sao
2017-08-11 13:26:55 +03:00
Eemeli Kallio
4c3453d26f
Fixed issue with no-sao argument
2017-08-11 13:12:22 +03:00
Eemeli Kallio
8674c0f5ee
Added paremeter for band and edge sao.
2017-08-11 11:57:09 +03:00
Eemeli Kallio
d9b93ea368
Added possibility to skip edge or band sao.
2017-08-11 11:51:49 +03:00
Arttu Ylä-Outinen
4b73bdd9aa
Skip checked motion vectors in early termination
...
Changes the second iteration of early termination to skip the motion
vectors that were already checked in the first iteration.
2017-08-09 14:29:09 +03:00
Arttu Ylä-Outinen
606d441362
Skip computing MV cost twice in hexagon search
...
Changes the first step of hexagon search to skip the zero offset since
the cost of the motion vector has already been computed.
2017-08-09 14:29:09 +03:00
Arttu Ylä-Outinen
fa4648061d
Add mv, cost and bitcost to inter_search_info_t
2017-08-09 14:29:08 +03:00
Arttu Ylä-Outinen
328f051d7f
Put inter search parameters in a single struct
...
Adds struct inter_search_info_t for holding the parameters that are used
by most function related to inter search. Passing the parameters in
a single struct greatly reduces the number of parameters for many
functions.
2017-08-09 14:27:53 +03:00
Miika Metsoila
0dd069f8af
Fixed using wrong POC in add_temporal_candidate
2017-08-09 13:50:21 +03:00
Miika Metsoila
25e0a954c7
Fixed 2 bugs causing incorrect video output
2017-08-09 13:50:21 +03:00
Arttu Ylä-Outinen
24ecddd2a5
Fix wrong strides in SAO reconstruction
...
Functions kvz_sao_reconstruct and encoder_sao_reconstruct used
frame->width as the stride instead of frame->rec->stride when accessing
frame->rec->data. This caused errors when using tiles and SAO.
2017-08-01 15:40:49 +03:00
Arttu Ylä-Outinen
f0bf959d17
Fix alignment errors in 32-bit build with MSVC
...
Changes the work_tree parameter in search.c functions from an array to
a pointer. Fixes "formal parameter with requested alignment of 8 won't
be aligned" errors.
2017-07-28 09:27:02 +03:00
Arttu Ylä-Outinen
9694bd2fae
Fix build on 32-bit systems
...
Function coeff_abs_sum_avx2 that was added in e950c9b
was outside the
AVX2 #if directive.
2017-07-28 09:19:29 +03:00
Arttu Ylä-Outinen
ecb0275cdd
Store CU arrays as pointers to the main array
...
Changes field state->tile->frame->cu_array->data to point to the CU
array in the main encoder state. Removes the need to copy the CU array
to the main CU array after search.
2017-07-28 08:36:45 +03:00
Arttu Ylä-Outinen
e950c9b101
Add AVX2 implementation for coefficient sum
2017-07-28 07:39:36 +03:00
Arttu Ylä-Outinen
d50ae6990c
Add sum of absolute coefficients to strategies
2017-07-28 07:39:15 +03:00
Arttu Ylä-Outinen
59faca0646
Skip CABAC coefficient cost for --rd=0
2017-07-28 07:33:03 +03:00
Arttu Ylä-Outinen
19e051ea40
Reduce intra threshold
...
Reduces intra threshold for --rd=0 from 20 to 8. Threshold of 20
increased BD-Rate too much.
2017-07-25 13:26:38 +03:00
Arttu Ylä-Outinen
e9cf15465e
Fix inter cost in bipred
...
The cost of coding MV ref indices and MV direction was added to bitcost
but not inter cost. Fixed by adding the extra bits to inter as well.
2017-07-24 15:24:04 +03:00
Arttu Ylä-Outinen
edbe00763e
Drop extra parameter in kvz_image_calc_sad
...
Drops the parameter max_lcu_below which was always set to -1.
2017-07-24 15:21:19 +03:00
Arttu Ylä-Outinen
ffac29061f
Fix extrapolated inter SATD
2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen
631ef53d2a
Fix inter cost calculations
...
Inter costs are computed using SAD except when fractional motion
estimation or bi-prediction is enabled. This commit changes
search_pu_inter_ref to recalculate the cost with SATD. Fixes inter/intra
cost comparisons since intra costs are always SATD costs.
2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen
6ce2fb1238
Add pixel offsets to encoder_state_config_tile_t
...
Adds fields offset_x and offset_y to encoder_state_config_tile_t.
2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen
2380ba0d41
Reduce copying in kvz_get_coeff_cost
...
Changes function kvz_get_coeff_cost to only copy the CABAC contexts and
not the whole encoder state.
Other threads could be simultaneously using the other parts of the
encoder state. Only copying the CABAC fixes a TSan data race warning.
2017-07-24 12:38:41 +03:00
Arttu Ylä-Outinen
24b462f801
Align coefficients to 8 bytes
...
Adds alignment attribute to lcu_coeff_t. The coefficients are sometimes
handled as 64-bit integers containing four coefficients so the arrays
should be aligned to 8 bytes.
Fixes a UBSan error about misaligned reads.
2017-07-24 12:37:37 +03:00
Arttu Ylä-Outinen
5ddb43c6fe
Fix undefined left shifts in rdo
...
Replaces left shifts by multiplications when the operand may be
a negative value. Left shift of a negative value is undefined behavior.
2017-07-24 12:35:10 +03:00
Arttu Ylä-Outinen
d1e64ad62b
Fix undefined left shifts
...
Replaces left shifts by multiplications when the operand may be
a negative value. Left shift of a negative value is undefined behavior.
2017-07-20 11:15:30 +03:00
Arttu Ylä-Outinen
07b5fb9caf
Fix out-of-bounds read in encoderstate
...
When calling encoder_state_encode_leaf with POC 0, index -1 of the GOP
array would be accessed. Fixed by skipping the code for I-frames.
2017-07-20 11:15:30 +03:00
Arttu Ylä-Outinen
8c4a3473a8
Change --owf=auto and --threads=auto selection
...
Changes OWF selection so that it is chosen based on the maximum number
of parallel CTUs. Number of threads is limited to prevent overhead from
extra threads.
2017-07-20 09:42:28 +03:00
Arttu Ylä-Outinen
4fc9b743c1
Drop an unnecessary pthread_cond_broadcast
...
Drop pthread_cond_broadcast on threadqueue->cond in function
kvz_threadqueue_waitfor. The broadcast caused threads to be woken up
more often than necessary.
2017-07-19 11:09:30 +03:00
Arttu Ylä-Outinen
14003c6a30
Disable printing PSNR with --no-psnr
2017-07-19 10:38:37 +03:00
Arttu Ylä-Outinen
e90bde5c62
Clarify PSNR output
...
Adds letters Y, U and V to the PSNR output to make it clearer that the
printed values are the luma and chroma PSNR.
2017-07-19 10:33:43 +03:00
Arttu Ylä-Outinen
fdb3480b54
Enable strategies for SAO reconstruction
...
Re-enables strategies for SAO reconstruction. They were disabled in
commit ec9ff42
.
2017-07-11 10:35:18 +03:00
Arttu Ylä-Outinen
333dba3884
Add static to SAO strategies
2017-07-11 10:02:01 +03:00
Miika Metsoila
e8cc2d8f6a
Small fixes
2017-07-07 13:58:19 +03:00
Arttu Ylä-Outinen
67a60a35e3
Fix invalid calls to normalize_lcu_weights
...
Changes encoder_state_init_new_frame to only call normalize_lcu_weights
when the weights have been written to the array and rate control is
enabled. When rate control is disabled, the weights are not used.
2017-07-07 11:05:31 +03:00
Arttu Ylä-Outinen
563bc26e71
Fix out-of-bounds read in AVX2 SAO
...
AVX2 version of SAO loaded offsets with a 256 bit read even though there
are only five 32 bit integers.
2017-07-06 13:04:52 +03:00
Arttu Ylä-Outinen
0850b17f96
Drop get_wpp_limit in search_inter
...
WPP limit for motion vectors is now computed inside fracmv_within_tile.
2017-07-05 13:22:53 +03:00
Arttu Ylä-Outinen
2a85f0f5a4
Move hard-coded MV limits to encoder_control_t
...
Adds field max_inter_ref_lcu to encoder_control_t. It is used to set up
inter-LCU dependencies in encoder_state_encode_leaf and restrict motion
vectors in fracmv_within_tile.
2017-07-05 13:22:53 +03:00
Arttu Ylä-Outinen
bb5354f7e2
Relax inter-CTU dependencies when SAO is off
...
When using WPP and OWF, the first CTU of a row depends on the last CTU
of the row below in the reference frame. This is necessary when SAO is
enabled since we currently do SAO for a whole CTU row at a time. When
SAO is disabled, however, it is unnecessary to wait for the whole row.
Changes CTUs to depend only on the CTU below in the reference frame
instead of the whole row when WPP and OWF are enabled and SAO disabled.
Gives a significant speedup when running on a machine with many CPU
cores.
2017-07-05 13:21:06 +03:00
Arttu Ylä-Outinen
1efa2708b2
Do SAO reconstruction for a single CTU at a time
...
Moves SAO reconstruction into encoder_state_worker_encode_lcu instead of
doing it in a separate step for the whole CTU row. Reconstruction of the
rightmost 10 pixels and bottommost 10 pixels of a CTU is delayed until
the neighboring CTU has been deblocked.
Doing SAO for the whole CTU row at a time caused unnecessary inter-CTU
dependencies when using WPP and OWF. The first CTU of a row would need
to wait until SAO was done for the row below in the previous frame.
Moving SAO reconstruction to immediately after deblocking each CTU fixes
this problem.
2017-07-04 15:14:31 +03:00
Arttu Ylä-Outinen
ec9ff42077
Rewrite SAO recon to handle arbitrary sized blocks
...
Adds width and height parameters to function kvz_sao_reconstruct and
changes it to take coordinates in units of pixels. This will be useful
for doing SAO for areas smaller than a whole CTU.
2017-06-30 16:09:18 +03:00
Miika Metsoila
dcd7acf4fd
Fixed crash and incorrect info output
2017-06-27 16:05:15 +03:00
Miika Metsoila
f8b6234fdb
Changes to refence lists to behave more like L0/L1 lists from the specification
2017-06-27 16:05:15 +03:00
Arttu Ylä-Outinen
2c66e0bbd2
Fix warnings about invalid reads in AVX2 ipol
...
AVX2 filter functions read pixels in chunks of 8 or 16 bytes. At the end
of the block, the read goes out of the bounds of the pixels array. The
extra pixels do not affect the result.
Fixes valgrind complaining about the invalid reads by allocating 5 extra
pixels in kvz_get_extended_block_avx2
2017-06-22 09:37:55 +03:00
Arttu Ylä-Outinen
4d20e156db
Fix handling intra period not multiple of GOP length
...
With low delay GOP structure, it is possible to use an intra period that
is not a multiple of the GOP structure length. Commit 00c9f52
changed
encoder_state_init_new_frame to reset POC on intra frames. GOP offset,
however, was not reset, resulting in invalid POCs and references for the
following frames.
This commit changes function kvz_encoder_feed_frame so that GOP offset
is correctly reset on intra frames.
2017-06-22 09:29:00 +03:00
Arttu Ylä-Outinen
00c9f52bd4
Fix setting picture type when using GOP
...
Changes encoder_state_init_new_frame to set intra frame pictype to
KVZ_NAL_IDR_W_RADL even when using GOP.
2017-06-21 13:21:47 +03:00
Arttu Ylä-Outinen
f54a25f112
Fix crash when immediately closing encoder
...
When closing the encoder, the pictures stored in the input frame buffer
are freed by repeatedly calling kvz_encoder_feed_frame. If the encoder
was closed immediately after opening it, kvz_encoder_feed_frame would be
called with an unprepared encoder state. This would trigger an assert.
Fixed by changing kvz_encoder_feed_frame so that it does not require the
encoder state to be prepared.
2017-06-15 11:57:46 +03:00
Arttu Ylä-Outinen
b74e0458fd
Set inter transform depth to zero
...
Sets max_transform_hierarchy_depth_inter to 0 in SPS. This saves some
bits because split_transform_flag does not need to be coded for inter
blocks.
When SMP and AMP blocks are enabled the depth is set to 1 instead.
Otherwise inter split flag would default to 1 for SMP and AMP blocks,
resulting in an unnecessary transform split.
2017-06-08 10:08:20 +03:00
Arttu Ylä-Outinen
8dd01ba5a9
Refactor helper functions in search
...
Combines functions lcu_set_intra_mode and lcu_set_inter_pu to a single
function. Removes some duplicated code.
2017-06-06 10:32:09 +03:00
Arttu Ylä-Outinen
1bbecf7584
Refactor work tree copy functions
...
Extracts common code shared by work_tree_copy_up and work_tree_copy_down
to a separate function.
2017-06-06 10:32:00 +03:00
Arttu Ylä-Outinen
2b169d5d63
Fix crash in kvazaar_close
...
Changes kvazaar_close to stop all threads before freeing encoder states.
Fixes a crash when the encoder is closed before all pictures have been
encoded.
2017-06-02 10:05:33 +03:00
Arttu Ylä-Outinen
eb9a05b7ef
Fix memory leak
...
Changes kvazaar_close to free the remaining pictures in the the input
frame buffer. Fixes a memory leak when the encoder is closed while there
are pictures left in the buffer.
2017-06-01 15:39:35 +03:00
Arttu Ylä-Outinen
8b2483ca1c
Combine intra reconstruction functions
...
Replaces function kvz_intra_recon_lcu_luma and
kvz_intra_recon_lcu_chroma in intra.c with function kvz_intra_recon_cu.
The new function can handle reconstruction for both luma and chroma.
Removes some duplicated code.
2017-05-24 12:07:31 +03:00
Arttu Ylä-Outinen
e67fdb853d
Move intra leaf TB recon to a separate function
...
Moves code for intra leaf transform block reconstruction from functions
kvz_intra_recon_lcu_luma and kvz_intra_recon_lcu_chroma to a new
function intra_recon_tb_leaf. Removes some duplicated code.
2017-05-24 12:07:31 +03:00
Arttu Ylä-Outinen
13d2fdbd21
Drop unused kvz_videoframe_get_cu functions
2017-05-24 11:15:31 +03:00
Arttu Ylä-Outinen
f5eef7f33c
Use luma pixel coordinates in encode_coding_tree
...
Changes functions encode_intra_coding_unit and encode_coding_tree to
take coordinate arguments in units of luma pixels instead of 8 px
blocks. This should make the code easier to understand.
2017-05-24 11:15:31 +03:00
Arttu Ylä-Outinen
525a5180ff
Combine intra CU encoding functions
...
Merges functions encode_intra_coding_unit and
encode_intra_coding_unit_encry. Removes a lot of duplicated code.
2017-05-24 11:12:40 +03:00
Arttu Ylä-Outinen
610c91b0c5
Use luma pixel coordinates in TU coding functions
...
Changes functions encode_transform_unit and encode_transform_coeff to
take coordinate arguments in units of luma pixels instead of 4 px
blocks. This should make the code easier to understand.
2017-05-23 15:36:16 +03:00
Arttu Ylä-Outinen
2e8838de6e
Fix crash when crypto compiled in but disabled
...
When kvazaar was built with crypto++ but running without using
encryption features, kvazaar attempted to delete an uninitialized crypto
handle. Fixed by setting the handle to NULL in kvz_encoder_state_init.
2017-05-23 14:01:48 +03:00
Arttu Ylä-Outinen
2f2c281e8e
Fix a memory leak in crypto
...
A CryptoPP::CFB_Mode<CryptoPP::AES>::Encryption was allocated at the
beginning of encoder_state_encode_leaf and was never freed. This commit
changes encoder_state_worker_encode_lcu to delete the CFB_Mode. Also
moves crypto handle from encoder_state_config_tile_t to encoder_state_t
so that it can be safely deleted without affecting other threads in the
same tile.
2017-05-23 11:51:25 +03:00
Arttu Ylä-Outinen
22155950c1
Rewrite crypto to conform to kvazaar code style
2017-05-23 11:51:25 +03:00
Arttu Ylä-Outinen
6829865190
Fix inline declaration in intra_mode_encryption
...
Moves the inline declaration of intra_mode_encryption before the type
and changes it to use the INLINE macro. Inline declaration after type
triggered a warning on GCC.
2017-05-23 11:50:32 +03:00
Arttu Ylä-Outinen
5f8e17d4ba
Eliminate a race condition in threadqueue
...
Fixes the order of acquiring locks for the job and its dependency in
kvz_threadqueue_job_dep_add. The dependency is locked before the job
that depends on it. This is the same order as in threadqueue_worker.
Acquiring the locks in different order in kvz_threadqueue_job_dep_add
and threadqueue_worker would sometimes result in a deadlock.
2017-05-18 12:25:53 +03:00
Arttu Ylä-Outinen
4b213477f0
Return best MV from inter early terminate
...
When using --me-early-termination=sensitive, early termination of inter
search used to always return the starting point if no tested motion
vector was good enough to continue the search. This commit changes
early_termination to always return the best motion vector and cost
found.
2017-05-18 09:05:14 +03:00
Arttu Ylä-Outinen
382636de55
Fix handling too large QPs
...
Changes kvz_config_validate to output an error if the given QP is out of
range and changes kvz_set_picture_lambda_and_qp to clip the QP to the
valid range if is too large after applying QP offset from GOP structure.
2017-05-17 12:41:51 +03:00
Arttu Ylä-Outinen
de8b59c681
Drop unused function kvz_coefficients_blit
2017-05-12 16:48:30 +03:00
Arttu Ylä-Outinen
bcfa5a3cd9
Add a comment explaining the coefficient order
2017-05-12 16:46:57 +03:00
Arttu Ylä-Outinen
95775a1645
Change coefficient storage order
...
Changes coefficient storage order to a zig-zag order. Reduces
unnecessary copying of coefficients to temporary arrays.
2017-05-12 16:46:57 +03:00
Arttu Ylä-Outinen
9395867a9a
Quantize all colors in a single traversal
...
Changes kvz_quantize_lcu_residual to process all three colors in
a single traversal of the TU tree.
2017-05-12 16:42:41 +03:00
Arttu Ylä-Outinen
1e58fd6b16
Split kvz_quantize_lcu_residual
...
Splits kvz_quantize_lcu_residual to two functions that handle the TU
tree recursion and quantization of a single TU.
2017-05-12 16:42:41 +03:00
Arttu Ylä-Outinen
cc87e0dcc7
Combine luma and chroma quantization functions
...
Replaces functions kvz_quantize_lcu_luma_residual and
kvz_quantize_lcu_chroma_residual in transform.c with function
kvz_quantize_lcu_residual. The new function can handle any of the YUV
colors. Removes some duplicated code.
2017-05-12 16:42:41 +03:00
Arttu Ylä-Outinen
1357dd0599
Pass coeffs through encoder state
...
Changes the way coefficients are passed from kvz_search_lcu to
kvz_encode_coding_tree. Drops fields coeff_y, coeff_u and coeff_v in
videoframe_t and instead passes them through field coeff in
endoder_state_t.
2017-05-12 16:42:41 +03:00
Eemeli Kallio
2cad3173ec
Reduced amount of modes for search_intra_rdo
2017-05-12 15:56:07 +03:00
Arttu Ylä-Outinen
26adef4492
Merge branch 'erp-aqp'
2017-05-12 15:05:24 +03:00
Eemeli Kallio
55e0e65733
Added INLINE to kvz_get_ic_rate and kvz_get_coded_level in rdo.c
2017-05-12 15:03:30 +03:00
Arttu Ylä-Outinen
ee3d4d0e78
Add adaptive QP for 360 degree video
...
Adds option --erp-aqp for enabling adaptive QP for 360 degree video with
equirectangular projection. When projected into a spherical surface,
the middle part of the video covers relatively larger area than the top
and bottom parts. Enabling --erp-aqp sets up a ROI delta QP array which
uses higher QPs for the top and bottom of the video and lower QPs for
the middle part.
2017-05-11 12:31:53 +03:00
Arttu Ylä-Outinen
79cb3a2fd3
Permit negative QP deltas in ROI
...
Delta QPs should not be arbitrarily restricted to positive values.
2017-05-11 12:13:47 +03:00
Arttu Ylä-Outinen
edfbd6f122
Add field lcu_dqp_enabled to encoder_control_t
...
Delta QPs for LCUs are enabled when either ROI coding or rate control is
enabled. Having a single field is simpler than always checking whether
ROI or rate control is enabled.
2017-05-11 12:13:47 +03:00
Arttu Ylä-Outinen
2f2405dfe6
Fix crash when PU depth is limited
...
When video width or height was not a multiple of the smallest CU size,
no prediction would be performed at the border CUs. Kvazaar would later
crash at an assertion failure when attempting to write the bitstream for
the CU.
Fixed by permitting inter and intra prediction when the CU split is
forced, even if CUs of that size would otherwise be disabled.
2017-04-27 10:35:48 +03:00
Arttu Ylä-Outinen
9130b5107c
Change handling of infinite PSNR in encmain
...
Changes encmain to print 999.99 as PSNR when SSE is zero. This behavior
is in line with HM. Previously SSE was set to 99 when it was zero.
2017-04-27 10:35:13 +03:00
Arttu Ylä-Outinen
a9c878b535
Fix crash with WPP when threads are disabled
...
When WPP is enabled, a reference to SAO reconstruction job is copied
from the wavefront to the main encoder state. However, when threads are
disabled, the job is a null pointer and dereferencing it crashes the
encoder. Fixed by adding a null pointer check.
2017-04-24 12:59:57 +03:00
Arttu Ylä-Outinen
2991962033
Add reference counting to threadequeue_job_t
...
Both the thread queue and the encoder states hold pointers to the thread
queue jobs. It is possible that a job is removed from the thread queue
and freed while the encoder state is still using it. This commit adds
reference counting to threadqueue_job_t in order to fix the problem.
Fixes #161 .
2017-04-12 16:13:52 +03:00
Arttu Ylä-Outinen
bd8adff43a
Drop unused defines in threads.h
2017-04-12 03:41:07 -07:00
Arttu Ylä-Outinen
7ab0a7aff2
Fix semaphores on Mac
...
POSIX semaphores are deprecated on Mac. This commit replaces POSIX
semaphores by Grand Central Dispatch semaphores when building on Mac.
2017-04-12 03:41:02 -07:00
Arttu Ylä-Outinen
26693e1402
Fix reliance on undefined behaviour in encmain
...
Pthread mutexes were used for synchronization in encmain by locking and
unlocking them from different threads. However, according to the POSIX
standard, unlocking a mutex from a different thread is undefined
behaviour. This commit replaces the mutexes by semaphores which can be
used from different threads.
2017-04-12 03:23:58 -07:00
Ari Lemmetti
47a9f0de04
Modify and use FILL_ARRAY macro to prevent warning on GCC 7
...
Following warning was given and is false positive
error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]
2017-04-11 14:04:25 +03:00
Eemeli Kallio
f7e01b8ba1
Fixed error on rd=3
2017-04-05 13:27:14 +03:00
Eemeli Kallio
9f605152ae
Changed intra to use best rough cost when using inter and rd=2
2017-04-05 13:01:32 +03:00
Ari Lemmetti
33ce101ab5
Revert "Use sizeof(uint32_t) to avoid warning in GCC7."
...
Did not fix the problem.
This reverts commit e3c3e74926
.
2017-04-03 20:21:33 +03:00
Ari Lemmetti
e3c3e74926
Use sizeof(uint32_t) to avoid warning in GCC7.
...
error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]
2017-04-03 19:16:09 +03:00
Arttu Ylä-Outinen
df359b8f95
Fix indentation in encode_coding_tree.c
...
Fixes indentation of a for loop that was causing a misleading
indentation warning on GCC.
Fixes #163 .
2017-03-08 22:56:28 +09:00
Pierre-Loup Cabarat
2b8ce5e47c
Add intra prediction modes encryption
2017-03-06 17:27:39 +01:00
Arttu Ylä-Outinen
aae141f2d3
Fix order of frames with --debug
...
When the decoding and presentation orders of pictures are different
(with GOP), the frames in YUV debug output would be in the decoding
order. This commit changes the kvazaar command line program to store the
reconstructed pictures in a buffer so that they can be output in the
presentation order.
Fixes #101 .
2017-02-28 14:09:24 +09:00
Arttu Ylä-Outinen
094b39e7fc
Refactor inter MV/merge candidate selection
...
Adds struct merge_candidates_t for holding the spatial and temporal
merge candidates. Changes functions with separate parameters for each
candidate to use the struct instead.
2017-02-22 15:56:36 +09:00
Arttu Ylä-Outinen
3409748a8f
Refactor inter MVP candidate selection
...
Adds helper function add_mvp_candidate.
2017-02-22 15:56:27 +09:00
Arttu Ylä-Outinen
ef6503c728
Refactor inter merge candidate selection
...
Adds helper function add_merge_candidate and replaces macro
CHECK_DUPLICATE with function is_duplicate_candidate.
2017-02-22 02:50:52 +09:00
Arttu Ylä-Outinen
f12e09bc40
Refactor inter TMVP selection
...
Adds helper function add_temporal_candidate to inter.c.
2017-02-22 02:08:10 +09:00
Arttu Ylä-Outinen
4f88066740
Refactor MV and merge candidate selection
...
Replaces macros APPLY_MV_SCALING and CALCULATE_SCALE with helper
functions.
2017-02-22 01:14:16 +09:00
Arttu Ylä-Outinen
db08041d9a
Refactor inter TMVP selection
...
Merges three if-clauses to remove two levels of indentation.
2017-02-21 23:56:01 +09:00
Marko Viitanen
85e2a40da3
Clip scaled motion vectors, scale and td/tb values to appropriate limits
...
Fixes #158 .
2017-02-20 15:40:20 +02:00
Ari Koivula
7369f25f64
Bump version to 1.1.0
2017-02-16 20:52:05 +02:00
Ari Lemmetti
b021d2244e
Reduce more unnecessary initializations.
2017-02-16 17:25:26 +02:00
Ari Lemmetti
acd12cba1e
Remove unnecessary memory initialization to zero
...
Values in interval [last_scanpos, 0] are overwritten in following for loop, except for the sig_coeff_inc value.
2017-02-16 16:48:48 +02:00
Ari Koivula
7ff33e1bf2
Fix default reference picture count
...
The default was 3, instead of the intended 1 of the medium preset.
2017-02-13 17:34:28 +02:00
Marko Viitanen
4251607c04
Fix a bug in TMVP reference POC list
2017-02-13 15:19:24 +02:00
Marko Viitanen
4270d451e6
Fixed some errors after rebase
2017-02-13 15:19:24 +02:00
Marko Viitanen
95effb00d0
Disable TMVP in frames with zero L0 references
2017-02-13 15:19:24 +02:00
Marko Viitanen
b4de1878be
Fixed TMVP scaling and candidate selection for B-frames
2017-02-13 15:19:23 +02:00
Marko Viitanen
23be633ad7
Added TMVP merge candidate scaling for L0
2017-02-13 15:19:23 +02:00
Marko Viitanen
e6aa1b9b9a
Renamed get_mv_cand_from_spatial() to get_mv_cand_from_candidates()
2017-02-13 15:19:23 +02:00
Marko Viitanen
1124bb5fd0
Cleaned up TMVP, mv candidate selection working, merge candidate selection not
2017-02-13 15:19:23 +02:00
Marko Viitanen
d65d2ec88d
WIP: add list of POCs used in the image when pushing to reference
2017-02-13 15:19:22 +02:00
Marko Viitanen
6a25cd3248
WIP: work on tmvp on inter
2017-02-13 15:19:22 +02:00
Marko Viitanen
e538a94eda
Enable TMVP with B-frames
2017-02-13 15:19:22 +02:00
Arttu Ylä-Outinen
363b8b49a2
Fix integer overflows with large resolutions
...
Limits video size so that the number of luma and chroma pixels can be
stored in an int. Fixes some integer overflows that resulted in
segmentation faults.
2017-02-12 11:40:13 +09:00
Arttu Ylä-Outinen
a5a925fc28
Replace timed waits by normal waits in threadqueue
...
Replaces calls to pthread_cond_timedwait with pthread_cond_wait in
threadqueue.c. Simplifies code, as there should be no need for the
timeout.
2017-02-11 15:42:03 +09:00
Arttu Ylä-Outinen
fd057498fc
Simplify kvz_config_alloc
2017-02-11 15:42:03 +09:00
Arttu Ylä-Outinen
7f7844caad
Fix finalizing uninitialized encoder states
...
Finalization functions for frame and tile encoder states accessed the
frame and tile fields of the encoder state even though they might be
NULL. This is the case when the initialization of an encoder state
fails. Fixed by adding NULL checks.
2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen
51786eda67
Drop redundant fields in encoder_control_t
...
Some of the fields in encoder_control_t were simply copies of the
corresponding fields in kvz_config. This commit drops the copied fields
in favor of using the fields in encoder_control_t.cfg directly.
2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen
6a178dee96
Fix leaking memory when --cqmfile given many times
...
Any previously allocated CQM file name was not freed when allocating
memory for the new file name.
2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen
63a567ad8a
Fix leaking memory when --roi given many times
...
Any previously allocated delta QP array was not freed when allocating
a new array.
2017-02-09 14:05:21 +09:00
Arttu Ylä-Outinen
bfd89136a4
Fix ROI delta QP array not getting freed
2017-02-09 13:23:55 +09:00
Arttu Ylä-Outinen
e78a8dfcf5
Copy the kvz_config passed to encoder_open
...
The kvz_config struct is created by the user but kvazaar keeps a pointer
to it. It is easy to break things by modifying the configuration outside
kvazaar. In addition, kvazaar modifies the struct even though it is has
a const modifier.
This commit changes the field cfg in encoder_control_t to be a copy of
the kvz_config struct instead of a pointer, removing modifications to
the const struct and allowing users to do whatever they want with it
after opening the encoder.
2017-02-09 13:23:54 +09:00
Ari Koivula
b8e3513a23
Fix crash with sub-LCU frame sizes and WPP
...
The end of slice was being calculated incorrectly, which led to no tile
being created inside the slice, which led to an assert triggering.
This fixes the wrong end of slice calculation, but also disallows
wavefront rows from being created, if there would be only one.
The wavefront initialization code assumes there are always more than
one row, so the inter-frame dependency doesn't get added properly.
Fixes #153 .
2017-02-08 21:41:30 +02:00
Ari Koivula
d893474bab
Fix encoder getting stuck on OS-X
...
Main thread was stuck looping on pthread_cond_timedwait because
the abs time given on OS-X had already passed and the wait
returned immediately without releasing the mutex to allow worker
threads to proceed.
Fix was to use the gettimeofday, which returns real time instead
of monotonic, which is what pthread_cond_timedwait wants.
2017-02-02 17:27:46 +02:00
Ari Koivula
4ceda1908b
Fix OS-X compiler warning
...
rdo.c:475:25: warning: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long long') but has parameter
of type 'int' which may cause truncation of value [-Wabsolute-value]
current.cost = -abs(quant_cost_in_bits) + (bits << PRECISION_INC);
^
rdo.c:475:25: note: use function 'llabs' instead
current.cost = -abs(quant_cost_in_bits) + (bits << PRECISION_INC);
2017-02-01 18:09:17 +02:00
Ari Koivula
c7d536bbcd
Fix OS-X compiler warning
...
cfg.c:1024:74: warning: format specifies type 'size_t' (aka 'unsigned
long') but the argument has type 'unsigned long long'
[-Wformat]
fprintf(stderr, "Too large ROI size: %llu (maximum %zu).\n", size, SIZE_MAX);
2017-02-01 18:09:04 +02:00
Ari Koivula
4467506ef1
Add missing kvz_ prefix
2017-01-31 18:38:02 +02:00
Ari Koivula
ed3bd898fd
Remove Exp-Golomb lookup table
...
This table takes 256kB and isn't used very much. Au revoir!
2017-01-31 18:31:05 +02:00
Ari Koivula
5513744d24
Merge branch 'slices'
2017-01-31 16:14:30 +02:00
Ari Koivula
52904d3e9f
Add --slices=tiles and --slices=wpp
...
This encapsulates tiles or WPP rows into their own slices, making
it possible to send them as soon as they are done, instead of waiting
for the other substreams to finish and coding the substream offsets
in the slice header.
2017-01-31 15:44:23 +02:00
Ari Koivula
0d4d0e869c
Add support for independent slices
...
Not used yet, but they work.
2017-01-31 15:11:50 +02:00
Ari Koivula
46ae382498
Fix bugs with slice header
...
These fixes allow more than one slice to be used to code a picture.
- Use correct number of bits to code the slice segment address.
- Don't offset_len_minus1 for slices without substreams.
2017-01-31 14:01:59 +02:00
Ari Koivula
f1fc0de2bf
Write slice headers to the parent stream
...
Appending to the child stream doesn't work is the child is a leaf
slice state.
Simplifies flow by removing distinction between tile and slice. Now
that slice headers are written in the parent stream, there is zero
difference between tiles and slices from bitstream point of view.
2017-01-31 13:55:05 +02:00
Ari Koivula
04cd875b2c
Move substream finalization to LCU coding job
...
Having some of the termination bits in the LCU coding and some in the
substream finalization was needlessly confusing. Doing substream
finalization directly after LCU coding makes it easy to verify that the
finalization is done correctly.
Removes one job per WPP row from the job queue.
Removes kvz_cabac_flush, because I don't like bits being put into the
bitstream implicitly. Better to have it all in the open.
2017-01-31 13:01:57 +02:00
Ari Koivula
ead490b7b7
Write a new slice NAL for every slice
2017-01-31 12:36:18 +02:00
Ari Koivula
cd496bf50b
Move first_nal_in_au to encoder_state->frame
...
Needed for writing NALs from encoder_state_write_bitstream_children
2017-01-31 12:28:28 +02:00
Arttu Ylä-Outinen
1e6463c08b
Fix inter bipred search
...
When the number of merge candidates was five, biprediction search would
read past the bounds of the priority list arrays. Fixed to limit the
search to the first four candidates.
2017-01-31 18:23:12 +09:00
Ari Lemmetti
2c069a3e5f
Prevent unnecessary cu search
...
Prevent further analysis as soon as it is known that splitting can not improve cost
2017-01-30 16:21:41 +02:00
Arttu Ylä-Outinen
9b889c3fab
Fix reading ROI files
...
- Checks the return value of fopen when opening the ROI file. Fixes
a segfault when the file cannot be opened.
- Check that the width and height are positive. Fixes reading past the
end of the delta QP array in kvz_set_lcu_lambda_and_qp.
- Check for overflow in width * height. Fixes an overflow resulting in
a segfault.
- Properly check that fscanf succeeds. Fixes silently accepting ROI
files that are too short.
- Properly close the FILE pointer.
2017-01-29 18:57:27 +09:00
Arttu Ylä-Outinen
46c9a483c3
Fix inter search for small SMP and AMP blocks
...
The function search_pu_inter_ref incorrectly rounded the coordinates of
the block to down to a multiple 8 pixels. Small SMP and AMP blocks may
start at coordinates that are not multiples of 8. Fixed by removing the
rounding.
Fixes a failing assert when --mv-constraint is used with --smp or --amp.
2017-01-29 13:34:50 +09:00
Arttu Ylä-Outinen
fb10b56b82
Fix checking if a low delay GOP structure is used
...
Stops assuming that having cfg->gop_lowdelay set means that GOP
structure is used since it is possible that cfg->gop_lowdelay is true
but cfg->gop_len is zero. Adds checks for cfg->gop_len where needed.
Fixes a possible division by zero in kvz_encoder_feed_frame.
2017-01-28 21:56:00 +09:00
Arttu Ylä-Outinen
4f56b04239
Drop an unnecessary conditional
...
Drop a conditional for depth > MAX_DEPTH in search_cu. The depth cannot
be greater than MAX_DEPTH (== 3) since an earlier if-clause checks that
it is less than MAX_PU_DEPTH (== 4).
2017-01-28 21:35:27 +09:00
Ari Koivula
937a764987
Fix bug in --mv-constraint
...
Subpixel motion estimation return 0-vector when no subpixel vector is
within the constraint. Fix is to not call subpixel motion estimation
when the integer vector is not within the constraint.
2017-01-26 09:55:57 +02:00
Ari Koivula
4a0121ac42
Add --roi parameter
...
Adds region of interest coding capability.
Works by reading a file of delta QP values which will then be applied
to each frame at LCU level.
2017-01-26 09:14:14 +02:00
Ari Koivula
6f61836989
Refactor kvz_rdoq_sign_hiding
...
Rename and reorder everything to make more sense.
- Moved input tables into their own struct and renamed them to what
they actually represent.
- Renamed pretty much every variable to comform to our style and
to make sense.
- Removed the lastCG stuff, as the function already gets passed the
last coeff anyway. (it was named width, what the hell?)
2017-01-19 23:58:17 +02:00
Ari Koivula
a85390d0ac
Clean up code using the fixed point frac bit tables
...
This is to prepare for changing the code using the floating point table
to use the fixed point table instead.
This also allows reducing the size of the fractional part, which was
useful for finding every place where the the fixed point presentation
is relied upon.
2017-01-19 20:20:51 +02:00
Ari Koivula
24a69c7467
Refactor luma deblocking
...
Changes luma deblocking to use gather and scatter instead of reading
to and writing from here and there in memory. Should make them
faster and easier to vectorize, or at least cleaner.
Splits strong and weak luma deblocking to two functions, as they have
almost nothing in common.
2017-01-17 22:13:39 +02:00
Ari Koivula
4cb2fca924
Refactor deblock decision
2017-01-17 19:34:17 +02:00
Arttu Ylä-Outinen
05794c3548
Add missing static to function lambda_to_qp
2017-01-11 15:53:55 +09:00
Arttu Ylä-Outinen
ee518e8ac4
Take header bits into account in rate control
2017-01-11 15:53:55 +09:00
Arttu Ylä-Outinen
c219d3cd94
Fix deblock when CU QP delta is enabled
...
Fixes deblock functions so that they use the correct QP for the filtered
edge. Adds field qp to cu_info_t.
2017-01-11 15:53:22 +09:00
Arttu Ylä-Outinen
82a98180e4
Clip LCU lambda to reduce quality fluctuation
...
Limits lambdas for each LCU based on the computed lambda from the
previous frame and the frame-level lambda.
2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen
93172fd251
Use separate alpha, beta and lambda for each LCU
...
Changes rate control to use the alpha and beta values stored in
lcu_stats_t instead of the frame-level values when selecting lambda and
QP for an LCU.
2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen
3af4e9cc8a
Allocate bits separately for each LCU
...
Bits are allocated based on the costs of the LCUs in the previous
completely coded frame.
Breaks deblock when rate control is used.
2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen
ff5e5ec6d4
Record info about coded LCUs
...
Adds field lcu_stats to encoder_state_config_frame_t. The following data
is recorded for each LCU:
- number of bits
- squared cost
- used lambda value
- alpha parameter used for rate control
- beta parameter used for rate control
2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen
2a4243acbe
Refactor rate control
...
Moves all code related to setting QP and lambda values to rate_control
module.
2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen
71633889ce
Enable CU QP delta when using rate control
...
When rate control is enabled, enable cu_qp_delta_enabled_flag in PPS
with diff_cu_qp_delta_depth set to 0. Also adds code for writing the QP
deltas and a new cabac context.
2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen
640ff94ecd
Use separate lambda and QP for each LCU
...
Adds fields lambda, lambda_sqrt and qp to encoder_state_t. Drops field
cur_lambda_cost_sqrt from encoder_state_config_frame_t and renames
cur_lambda_cost to lambda.
2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen
435c387357
Refactor rate control
...
- Defines MIN_LAMBDA and MAX_LAMBDA constants.
- Moves resetting state->frame->cur_gop_bits_coded to rate_control.c.
- Changes gop_allocate_bits to return the number of bits allocated like
pic_allocate_bits does.
2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen
6c4f2d196a
Move fields from encoder_state_t to frame
...
Moves fields prepared and frame_done from encoder_state_t to
encoder_state_config_frame_t.
2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen
97863cdaa2
Fail encoder init when CQM file cannot be opened
2017-01-08 19:17:43 +09:00
Arttu Ylä-Outinen
db5e750c7f
Fix --threads=auto
...
When --threads=auto was given on the command line, cfg->threads was
actually set to zero, disabling threads altogether. Fixed to set
cfg->threads to -1, so that the number of threads is chosen
automatically.
2017-01-08 17:58:22 +09:00
Ari Koivula
a9e45efcfc
Add a fast lane for byte-aligned bitstream writes
...
The CABAC engine only writes to the bitstream when it has a full byte.
These writes are also always byte-aligned, so there is no need to even
check for stream alignment.
Speedup was around 3% with ultrafast and low QP.
2016-12-23 17:01:44 +02:00
Jaakko Laitinen
deb63f735f
Fix gop disabling
2016-12-20 14:25:13 +02:00
Ari Lemmetti
70a52f0e48
10-bit: add missing bit depth adjustment to ssd
2016-11-17 19:28:04 +02:00
Ari Koivula
fa078102f1
Fix 32bit compilation
...
Got a warning about implicit cast from uint64_t to void*.
2016-11-17 17:53:57 +02:00
Ari Koivula
5ceec06bd3
Merge pull request #148 from Venti-/crypto
...
Crypto
2016-11-16 21:33:55 +02:00
Ari Lemmetti
c31207ea7d
Optimize intra reference building
...
-Add function with reduced logic for the most common case
2016-11-16 18:28:42 +02:00
Ari Koivula
24f2a23ef8
Remove unnecessary crypto state
...
The frame does not need it's own crypto state, since it always has at
least one sub tile.
2016-11-16 13:58:41 +02:00
Ari Koivula
8951e34fd2
Change crypto.h stubs to print instead of assert
2016-11-16 13:58:41 +02:00
Wassim Hamidouche
ea82c38906
correct memory allocation
2016-11-16 12:35:28 +02:00
Wassim Hamidouche
da3e2d1d07
resolve parallel encryption
2016-11-16 12:35:28 +02:00
Ari Koivula
b8a618e666
Fix problems with >8 bit input
...
Enforce bit depth promised by --input-bitdepth to avoid crashes when
larger values are provided.
Do endianess byte swap for all bytes when the buffer gets extended
to multiple of 8 pixels, and not just the number of input pixels.
Don't swap bytes on a little-endian system.
2016-11-13 19:58:54 +02:00
Ari Koivula
2c005cda25
Fix bug with sub-pixel motion estimation in tiles
...
The width of the tile was being used to index the frame pixel buffer
instead of the width of the buffer.
2016-11-07 15:53:52 +02:00
Ari Koivula
78a28e0338
Reformat --help message
...
- Reduce indentation to 6 spaces
- Word wrap everything to under 80 characters
- Remove defaults from options covered by presets
- Add a dash in front of argument descriptions
- Add --(no-) to names of parameters that accept it and remove mention
of enabling or disabling
- Add executable and scripts as a dependancy to make docs
2016-11-04 15:40:28 +02:00
Ari Koivula
d18de19d8a
Fix DTS and PTS not being passed on through lib API
...
Fixes "cur_dts is invalid" warning from FFmpeg.
2016-10-28 19:05:47 +03:00
Ari Koivula
0c41c2ebd6
Make CLI set PTS for each input picture
...
This value is not represented in the HEVC bitstream, which is why it
was not set previously. FFmpeg sets and needs it however, so make the
CLI set it as well to make sure we handle it correctly.
2016-10-28 19:03:03 +03:00
Ari Koivula
5bf745460d
Re-categorize options in the help message
...
- Move VUI stuff to the bottom
- Merge Parallel processing, WPP, Tiles and slices
- Add more categories for the other options
2016-10-27 03:26:15 +03:00
Ari Koivula
cb6672b452
Disable WPP when Tiles are enabled
...
Closes #142 .
2016-10-27 02:07:10 +03:00
darealshinji
488d042e5f
Bump KVZ_VERSION
2016-10-25 12:32:13 +02:00
Ari Lemmetti
29153ed503
Remove unused variable
2016-10-21 17:28:42 +03:00
Ari Lemmetti
778e46dfd8
Add AVX2 version of SSD
2016-10-21 15:07:53 +03:00
Ari Lemmetti
6f5d7c9e06
Move SSD to strategies
2016-10-21 15:07:23 +03:00
Ari Lemmetti
89b941eab4
Fix typo
2016-10-21 15:07:02 +03:00
Alexis Ballier
1dcc993743
Include i386 & i486 for compiling intel asm.
...
x86_64-pc-linux-gnu-gcc -m32 that I use for building 32bits libraries on amd64 defines only __i386__.
2016-10-14 18:07:37 +02:00
Arttu Ylä-Outinen
5fb7afe8c4
Add --implicit-rdpcm command line parameter.
...
Makes it possible to use lossless coding without implicit residual DPCM.
2016-10-03 20:01:55 +09:00
Arttu Ylä-Outinen
5affc0f527
Use implicit RDPCM in lossless mode.
...
Sets implicit RDPCM flag in SPS when lossy coding is disabled and
applies DPCM to intra residual when prediction mode is horizontal or
vertical.
2016-10-03 19:31:38 +09:00
Ari Koivula
016dbe0894
Further refine presets
...
The rd-complexity of slow presets is better with a less agressive GOP.
Adding the GOP as part of the preset improved BDRate enough, that it
didn't make sense anymore to have a veryslow target the best BDRate.
Instead, push that responsibility to placebo by making it a little bit
faster.
2016-09-29 17:35:12 +03:00
Ari Koivula
31c5ff0f16
Add cross-platform core number detection
...
Well, turns out pthread_num_processors_np isn't standard so we need to
do this crap. Threw in hyper threading detection as a bonus.
2016-09-29 00:03:21 +03:00
Ari Koivula
8c7351eac8
Fix lp-gop with depth 1
...
GOPs with depth 1 had the same structure as those with depth 2:
g4d3t1 = 3 2 3 1
g4d2t1 = 2 2 2 1
g4d1t1 = 2 2 2 1
It now results in the correct:
g4d1t1 = 1 1 1 1
2016-09-29 00:03:21 +03:00
Ari Koivula
a395aeaac9
Set default settings to those of --preset=medium
2016-09-29 00:03:21 +03:00
Ari Koivula
4388fe0d30
Set presets to ratedistortion-complexity optimized versions
2016-09-29 00:03:20 +03:00
Ari Koivula
facb1e16df
Use -p64 -q22 and --gop=lp-g4d3t1 by default
...
Coding inter without GOP of any kind really isn't a very sensible
default. Defaulting to B-GOP of some kind would be more better,
but lp-gop is more robust for now.
2016-09-29 00:03:20 +03:00
Ari Koivula
d7391a9593
Improve default for number of parallel frames
2016-09-29 00:03:20 +03:00
Ari Koivula
19d423ab29
Use all available cores by default
2016-09-29 00:03:20 +03:00
Ari Koivula
3f138f087a
Allow non-gop-length --period for lp-gop
2016-09-29 00:03:19 +03:00
Ari Koivula
16790c9f15
Remove number of references from --gop=lp syntax
...
The number of references should be part of the presets, so gop should
be defined separately.
2016-09-29 00:03:19 +03:00
Ari Koivula
cbfa824d1a
Merge branch 'simd'
2016-09-27 20:49:45 +03:00
Ari Koivula
14a7bcba25
Use a faster function for clipped inter SAD
...
Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes
for which we don't have AVX versions yet.
Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for:
--preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp
* Suite speed_tests:
-PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
-PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
2016-09-27 20:48:30 +03:00
Arttu Ylä-Outinen
4313e56c2d
Add --no-rdoq-skip command line switch
2016-09-11 17:40:16 +09:00
Ari Koivula
a7a33b08ec
Remove --slice-addresses from usage message
...
And give a warning if it's used.
Slices will have to be implemented at some point, but they aren't yet
so let's not advertize them.
2016-09-10 21:06:00 +03:00
Eemeli Kallio
f41e428e5f
Removed kvz_skip_unnecessary_rdoq and reworked --rdoq-skip to skip 4x4 blocks when it is on.
2016-09-09 10:26:07 +03:00
Eemeli Kallio
ed9c0b0416
RDOQ reworked in rdo.c. rdoq_signhide now skips coeffs that are after best_last_idx.
2016-09-09 10:16:51 +03:00
Ari Koivula
02cd17b427
Add faster AVX inter SAD for 32x32 and 64x64
...
Add implementations for these functions that process the image line by
line instead of using the 16x16 function to process block by block.
The 32x32 is around 30% faster, and 64x64 is around 15% faster,
on Haswell.
PASS inter_sad: 28.744M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec)
PASS inter_sad: 7.882M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)
to
PASS inter_sad: 37.828M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec)
PASS inter_sad: 9.081M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)
2016-09-01 21:36:39 +03:00
Ari Koivula
d0512d25c6
Use fixed point in get_mvd_coding_cost
2016-08-30 21:37:12 +03:00
Ari Koivula
ec7507a935
Further optimize get_ep_ex_golomb_bitcost
...
Unrolled 16-bit log2 calculation.
2016-08-30 21:37:01 +03:00
Ari Koivula
a4ba794587
Optimize get_ep_ex_golomb_bitcost
...
Arrange the decision tree such that there is only 3 branches on the
most common paths and the more likely branch is always fall-through.
A profile guided optimization pass would probably do something similar.
2016-08-30 05:24:16 +03:00
Ari Koivula
82cfab58f8
Improve fast mvd coding cost estimation
...
A lot of time is being taken up by this function on ultrafast, and it
doesn't do a very good job. This change aims to both simplify the
logic and make the estimate better.
The logic is simplified by using a look up for the step mvd bit cost
step function instead of mimicking the binarization process. The
estimation is made better by checking fractional cabac bit costs.
The new function returns the same results as
kvz_get_mvd_coding_cost_cabac, but is also faster than the old
function.
2016-08-30 04:55:09 +03:00
Ari Koivula
d31be8eb27
Make mvd_coding_cost functions take const cabac
2016-08-30 04:46:46 +03:00
Ari Koivula
64d631c174
Fix 8bit to 10bit input conversion regression
2016-08-25 22:09:40 +03:00
Ari Koivula
27789125d8
Fix input bit depth conversion
...
The input was being shifted to the wrong direction.
2016-08-25 22:05:25 +03:00
Ari Koivula
4ec039004b
Add monochrome encoding
...
Write bitstream without chroma when encoding with --input-format=P400.
This reduces bitstream size by 0-1 %, compared to coding monochrome in
420 format, and speeds up encoding slightly due to not processing
chroma.
2016-08-25 20:15:26 +03:00
Ari Koivula
c5b70cf812
Add chroma format support to yuv_t
2016-08-24 19:20:53 +03:00
Ari Koivula
032ed30ff4
Add chroma format support to kvz_picture
...
Add picture_alloc_csp to libkvz api to allocated pictures with chroma
format different from 420.
2016-08-24 19:20:53 +03:00
Ari Koivula
48ccc26839
Add --input-format and --input-bitdepth
...
Adds reading of 10 bit input for 10-bit encoding.
2016-08-24 19:20:53 +03:00
Ari Koivula
cc08073615
Refactor some indexing weirdness in init_lcu_t
...
I thought there might be a bug in this so I cleaned it up.
2016-08-24 19:12:48 +03:00
Ari Koivula
b6d674d66e
Refactor integer vector inter prediction
...
This code was pretty bad, so I cleaned it up a bit.
2016-08-24 19:09:26 +03:00
Ari Lemmetti
28c4174d0e
Fix incorrect shuffle parameters
...
_MM_SHUFFLE uses reverse order
2016-08-23 19:40:46 +03:00
Ari Lemmetti
ce77bfa15b
Replace KVZ_PERMUTE with _MM_SHUFFLE
...
The same exact macro already exists
2016-08-22 19:08:46 +03:00
Jovasa
68eef660bd
Fixed search around mv_in in fullsearch not being saved.
2016-08-19 15:19:29 +03:00
Eemeli Kallio
99d8b9abeb
Changed skip_rdoq name to kvz_skip_unnecessary_rdoq. Changed the order it uses when it goes through CGs and tuned its sum calculation.
2016-08-18 14:02:56 +03:00
Eemeli Kallio
1fb4755f31
Added rdoq-skip to quant-generic.c
2016-08-18 12:17:54 +03:00
Eemeli Kallio
d20ac03ca2
Added --rdoq-skip option
2016-08-18 12:17:53 +03:00
Marko Viitanen
83cf801664
Fixed MV constraint condition in bipred
2016-08-18 08:53:17 +03:00
Marko Viitanen
5ae1c595f2
Fixed slice_temporal_mvp_enabled_flag and disabled TMVP with tiles
...
- slice_temporal_mvp_enabled_flag should be signalled also with non-IDR I-slices
2016-08-10 14:51:41 +03:00
Marko Viitanen
5326519182
TMVP cleanup and const qualifier fixes
2016-08-10 14:10:43 +03:00
Marko Viitanen
f40907260d
Added config parameter for TMVP and cmdline option --no-tmvp
...
- Enabled by default
- Cannot be used with GOP at the moment
2016-08-10 14:09:29 +03:00
Marko Viitanen
fd52dac1f7
Fixed TMVP scaling
2016-08-10 14:09:28 +03:00
Marko Viitanen
c664bc8cf7
Added flag collocated_ref_idx to the slice header
2016-08-10 14:09:28 +03:00
Marko Viitanen
c5f2611a38
Fixes for TMVP to work with the new CU array
2016-08-10 14:09:28 +03:00
Marko Viitanen
d85af5755b
TMVP working when only 1 ref frame
2016-08-10 14:09:28 +03:00
Marko Viitanen
39f0165efe
Fix a bug in TMVP, the reference cu_array was being overwritten
2016-08-10 14:09:27 +03:00
Marko Viitanen
adab8c327e
Clean TMVP code
2016-08-10 14:09:20 +03:00
Marko Viitanen
5fa8226ac9
Temporal merge candidate selection
2016-08-10 14:09:20 +03:00
Marko Viitanen
f83042f4a1
Temporal MV candidate selection
2016-08-10 14:09:19 +03:00
Marko Viitanen
f8671581e3
Implemented function kvz_inter_get_temporal_merge_candidates()
2016-08-10 14:09:19 +03:00
Marko Viitanen
2956bdb379
Added flag slice_temporal_mvp_enabled_flag
2016-08-10 14:09:19 +03:00
Arttu Ylä-Outinen
2a946bd88e
Rename encoder_state_t.global to frame
...
"Frame" is more accurate than "global" since when OWF is used, encoder
states for each frame have their own struct.
2016-08-10 13:22:36 +09:00
Arttu Ylä-Outinen
5fbb0a8c27
Fix includes
2016-08-10 13:05:40 +09:00
Arttu Ylä-Outinen
aabf6ca3ee
Extract encoding code from encoderstate.c
...
Moves functions kvz_encode_coding_tree and kvz_encode_coeff_nxn from
encoderstate.c to encode_coding_tree.c.
2016-08-09 22:16:50 +09:00
Arttu Ylä-Outinen
803f29be8f
Remove reconstructed picture allocation in lossless.
...
Changes encoder_set_source_picture to set the reconstructed picture to
a copy of the source picture instead of allocating a new picture when
lossless coding is used.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
aaec473a19
Refactor encoder state initialization.
...
- Moves allocation of the reconstructed picture after the source picture
is set.
- Extracts main state initialization to a separate function from
encoder_state_new_frame.
- Changes kvz_encoder_feed_frame to return the frame.
- Renames some functions to better match their purpose.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
cd7024b3a5
Skip computing SSD when using lossless coding.
...
The SSD is always zero since it is lossless.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
fbbe5d1844
Use kvz_pixels_calc_ssd for SSD in search.c.
...
Replaces loops for computing SSDs by calling kvz_pixels_calc_ssd in
search.c.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
22cc97ffb1
Fix missing field initializers.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
06b82bf888
Disable filters, trskip and signhide in lossless.
...
When lossless coding is used, deblock and SAO are skipped, transform
skip flag is not written and sign hiding is not used.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
97451ec401
Align assignments in encoder.c.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
1dc94663c3
Bypass transform and quantization with --lossless.
...
When --lossless is given, set cu_transquant_bypass_flag for every CU and
bypass transform and quantization by directly copying reference pixels
to reconstruction and the residual to coefficients.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
2113b0182d
Enable PPS-level tq bypass flag with --lossless.
...
Sets transquant_bypass_enable_flag to true in PPS when --lossless is
given.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
a5897bbece
Make cabac context initialization tables static.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
23e7d9bb37
Add --lossless command line parameter.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen
5372ea432f
Update README and manpage.
2016-08-03 14:25:08 +09:00
Ari Lemmetti
6bcba004ff
Comment out to fix unused code error on clang.
2016-07-14 14:12:16 +03:00
Ari Lemmetti
c0979ebdcb
Implement AVX2 luma sampling
2016-07-14 12:53:02 +03:00
Ari Lemmetti
6244560426
Add avx2 strategy for kvz_filter_frac_blocks_luma.
2016-07-14 12:53:02 +03:00
Ari Lemmetti
9c4e9e049b
Load only what is needed. Eliminate latency from hadds.
2016-07-14 12:53:01 +03:00
Ari Lemmetti
7f71cb423a
Check 4 fractional pixel positions simultaneously
2016-07-14 12:52:24 +03:00
Ari Lemmetti
ad445ab8a1
Transition to kvz_filter_frac_blocks_luma
2016-07-14 12:51:02 +03:00
Ari Lemmetti
fccfbd2f28
Add strategy for kvz_filter_frac_blocks_luma
2016-07-14 12:51:02 +03:00
Ari Lemmetti
e9c3074d32
Add buffers and definitions for upcoming filtering
...
Samples are to be filtered in separate blocks instead of
making one big picture with interpolated pixels
2016-07-14 12:51:02 +03:00
Ari Lemmetti
7afe7e963b
Use fme_level to control the search accuracy.
2016-07-14 12:51:01 +03:00
Ari Lemmetti
5fa323bf25
Skip searching best hpel twice. Make hpel and qpel loops similar.
2016-07-14 12:51:01 +03:00
Ari Lemmetti
bc98a9affa
Change the search order to suit lighter fme search
2016-07-14 12:51:01 +03:00
Ari Lemmetti
2b0c8db349
Add quad satd for avx2
2016-07-14 12:50:24 +03:00
Ari Lemmetti
0ff69fd6f8
Add any size multi satd
2016-07-14 12:48:37 +03:00
Ari Lemmetti
d17b9e7d6e
Allow subme parameters 0-4
...
Update usage, presets,defaults,lib version
2016-07-12 19:49:38 +03:00
Arttu Ylä-Outinen
62ad57d0bf
Fix kvz_image_list_add for zero-sized lists.
...
When a list does not have space for the new element, its size is
doubled. If the size of the list is zero, it would not be resized. Fixed
to always resize the list so that the new element can be added.
2016-06-22 13:35:16 +09:00
Arttu Ylä-Outinen
433e528af7
Drop unused variable in search_pu_inter.
...
Removes unused variable max_px_below_lcu.
2016-06-22 13:35:16 +09:00
Arttu Ylä-Outinen
7836ff6ec9
Drop unused functions.
...
Removes functions kvz_coefficients_calc_abs, kvz_intra_rdo_cost_compare
and kvz_rdo_cost_intra which are no longer used.
2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen
e4b5840f56
Add parentheses around macro arguments in cabac.h.
2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen
a387b74e51
Fix resolution auto-detection.
...
Only try to guess the resolution from filename when neither width nor
height is given.
2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen
097bf8f3c0
Add a typedef for mvd coding cost functions.
2016-06-20 13:56:10 +09:00
Arttu Ylä-Outinen
d3c0e49286
Update comments.
2016-06-16 20:25:08 +09:00
Arttu Ylä-Outinen
ae832cda8c
Pack cbf flags in cu_info_t to two bytes.
...
Reduces size of cu_info_t.
2016-06-16 20:24:19 +09:00
Arttu Ylä-Outinen
cad2d496b8
Enable 4x8 and 4x16 partition modes
...
Enables search for 2NxN and Nx2N partition modes for 8x8 CUs and 2NxnU,
2NxnD, nLx2N and nRx2N partition modes for 16x16 CUs.
Changes the loop for copying reconstructed luma pixels in
kvz_inter_recon_lcu to use 4 byte chunks instead of 8 byte chunks since
it is now possible to have 4 pixel wide blocks.
2016-06-16 20:23:16 +09:00
Arttu Ylä-Outinen
90df7350f0
Make deblocking work with 4 pixel wide blocks.
2016-06-16 20:21:50 +09:00
Arttu Ylä-Outinen
bf26661782
Add support for 4x4 blocks to SATD_ANY_SIZE.
...
Makes functions satd_any_size_generic and satd_any_size_8bit_avx2 work
on blocks whose width and/or height are not multiples of 8.
2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen
2ae260e422
Change width of cells in lcu_t to 4 pixels.
...
Intra mode info for NxN partition units is now stored in the
corresponding 4x4 cell in lcu_t.cu array.
2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen
360f5bb8da
Always use pixel coordinates for indexing lcu_t.
...
Removes macro LCU_GET_CU and uses LCU_GET_CU_AT_PX in its place.
2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen
46e8122d27
Add functions for indexing cu_array_t structures.
...
Replaces macro CU_ARRAY_AT with functions kvz_cu_array_at and
kvz_cu_array_at_const.
2016-06-16 18:52:19 +09:00
Arttu Ylä-Outinen
c5afabdd3b
Change width of cells in cu_array_t to 4 pixels.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen
57a3d9b4b9
Add a function for copying CU data from LCUs.
...
Adds function kvz_cu_array_copy_from_lcu which CU info data from an
lcu_t structure to a cu_array_t structure.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen
2c85a00a55
Change kvz_cu_array_alloc to use pixel dimensions.
...
Changes function kvz_cu_array_alloc to take width and height parameters
in pixels instead of SCUs.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen
b276a347c0
Add a macro for indexing cu_array_t.
...
Adds macro CU_ARRAY_AT(cu_array, x, y) to cu.h.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen
8ac1f1986e
Move CU array copy to a separate function.
...
Moves code for copying parts of cu_array_t to a new function
kvz_cu_array_copy in cu module.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen
41e75daed7
Fix overlapping memcpy in kvz_search_cu_smp.
...
The destination and source pointers might be equal. Fixed by replacing
the memcpy call with a simple assignment.
2016-06-15 12:25:11 +09:00
Ari Lemmetti
29af8bcd21
Remove const to match function signature
2016-06-14 18:19:40 +03:00
Eemeli Kallio
5af6ab320c
Merge branch 'me_early_terminate'
...
Conflicts:
configure.ac
src/cfg.c
src/cli.c
src/kvazaar.h
src/search_inter.c
2016-06-14 15:03:35 +03:00
Eemeli Kallio
43c7778b82
Updated version number.
2016-06-14 10:53:04 +03:00
Arttu Ylä-Outinen
23fdeeaf10
Move mv_cand and mv_dir into a bitfield in cu_info_t.
...
Reduces size of cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen
35aadf6776
Reduce size of type in cu_info_t to two bits.
...
Reduces size of cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen
1cbe844f79
Move inter and intra into an union in cu_info_t.
...
Reduces size of cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen
b6d793ef33
Drop field inter.mvd from cu_info_t
...
Instead of storing the mv differences in cu_info_t, they are computed
from the mv candidates and the motion vector. Reduces the size of
cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen
98aa906f30
Drop field coded from cu_info_t
...
It can be inferred from the position and size of the CU.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen
ebb10763f1
Drop field inter.mv_ref_coded from cu_info_t.
...
Storing inter.mv_ref_coded in cu_info_t is unnecessary since it can be
computed from refmap and inter.mv_ref.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen
4be5c8f349
Move flags into a bitfield in cu_info_t.
...
Reduces the size of cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen
30e9ee988d
Move bitcost field out of cu_info_t.inter.
...
The bitcost is only needed for the currently searched CU.
Fixes bitcost of the second PU being ignored when using SMP or AMP.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen
16d13ed046
Move cost field out of cu_info_t.inter
...
The cost is only needed for the currently searched CU.
2016-06-14 12:20:05 +09:00
Arttu Ylä-Outinen
c5c2c182d9
Drop unused field mode from cu_info_t.inter.
2016-06-14 12:18:17 +09:00
Eemeli Kallio
e4f1a74512
Added early termination option for motion estimation.
...
Conflicts:
src/search_inter.c
2016-06-13 16:20:35 +03:00
Wassim Hamidouche
5bc7287c67
add fix for crypro
2016-06-09 10:49:31 +03:00
Wassim Hamidouche
35634b5596
correct MV sign encryption
2016-06-09 10:49:31 +03:00
Wassim Hamidouche
15abdc6e81
correct sign encryption
2016-06-09 10:49:31 +03:00
Wassim Hamidouche
73c3203a26
encry coef transfs
2016-06-09 10:49:31 +03:00
Wassim Hamidouche
7ad5f8bbe5
encry coef transf sign
2016-06-09 10:49:31 +03:00
Wassim Hamidouche
02b0712973
fix g++ compilation
2016-06-09 10:48:44 +03:00
Ari Koivula
a2170f0763
Compile the cryptopp wrapper only when used
...
This should allow us to avoid an unnecessary dependancy to a C++
compiler.
Conflicts:
configure.ac
2016-06-07 17:11:12 +03:00
Ari Koivula
182038c743
Don't allow enabling encryption when it's not compiled in
2016-06-07 16:58:09 +03:00
Ari Koivula
8eb087120e
Make VisualStudio ignore the crypto stuff
...
Add stubs for the crypto functions so we can refer to them, even if we
never use them.
2016-06-07 16:58:09 +03:00
Wassim Hamidouche
76cb6dc6c2
add check flags
2016-06-07 10:54:26 +02:00
Ari Koivula
60ea8a359f
Add --crypto parameter
2016-06-07 10:31:40 +02:00
Wassim Hamidouche
02308d1ba6
add MVs encryption
2016-06-07 10:28:30 +02:00
Wassim Hamidouche
4637c8a828
compile Kvazaar encoder with ITpp library
2016-06-07 08:33:04 +02:00
Eemeli Kallio
8f182ac6de
Added functions select_starting_point and mv_in_merge to search_inter.c
2016-06-06 17:16:04 +03:00
Ari Koivula
fe71638a96
Fix problem with ASM compilation
...
When compiling C++ files along with C, libtool would complain about
the --tag missing, even though CC should be the default.
2016-06-06 15:47:56 +03:00
Eemeli Kallio
836a3b1daa
Added functions select_starting_point and mv_in_merge.
2016-06-06 12:18:33 +03:00
Ari Koivula
4eaacbe23e
Fix bug with lp-gop and ratecontrol
...
The first frame was always qp51 due to gop_offset being -1 for the
first frame. This fix makes it so that bits are allocated as if it was
the last (high quality) frame from the previous GOP.
2016-05-27 15:53:55 +03:00
Ari Koivula
3fbd7ed97f
Add GOP layer weights for lowdelay-P
...
When using ratecontrol with lowdelay-P, this improves BDRate by 1-25%.
Strongest effect is when using 4 layers and multiple references.
Also allow using 1 or 2 layers with ratecontrol.
2016-05-27 13:46:26 +03:00
Ari Koivula
67acead4bc
Fix referring over IDR boundary when using --gop
...
This problem resulted in an illegal bitstream with --gop=lp, because it
uses IDR's. The --gop=8 would not code IDR pictures, even when told to
with -p, which masked this problem.
This fix solves the problem with --gop=lp and also prevents references
across the intra picture in --gop=8. The intra pictures should be set
to IDR in a later fix, or an alternate method of differentiating
between IDR and non-IDR intra should be made.
2016-05-27 13:20:53 +03:00
Ari Koivula
a77dc1610e
Refactor encoder_state_remove_refs
...
I needed to debug this, so I rewrote it to make sense. There is an
obvious bug with the IDR handling that I left in place to fix in a
separate commit.
2016-05-27 13:20:45 +03:00
Eemeli Kallio
b5c05e58e0
Fixed typo in strategyselector.c
2016-05-24 11:04:29 +03:00
Ari Lemmetti
68c6f0f7b8
Enable deblocking for every preset
...
Deblocking adds very little complexity
while giving massive coding performance boost
2016-05-17 18:50:31 +03:00
Ari Lemmetti
6a07761b46
Add smp and amp options to presets
2016-05-17 14:26:58 +03:00
Ari Lemmetti
3107a93eaf
Fix avx2 chroma sampling for amp
2016-05-17 14:09:57 +03:00
Ari Koivula
24d0f9f685
Fix usage message for --hash
2016-05-11 15:03:43 +03:00
Ari Koivula
a1c772b696
Merge pull request #136 from MrAsura/cu-split-termination
...
Cu split termination
Closes #133 .
2016-05-10 17:22:08 +03:00
Jaakko Laitinen
7010526b1d
Removed tabs.
2016-05-10 15:52:44 +03:00
Jaakko Laitinen
a77eb5c874
Fixed type conversion error when parsing cu split termination.
2016-05-10 14:34:46 +03:00
Jaakko Laitinen
0d361d5bc7
Moved cu split termination from a pre-processor to a input parameter.
2016-05-10 14:15:41 +03:00
Ari Koivula
1dbe4eb852
Merge branch 'mv-full'
2016-05-10 13:28:07 +03:00
Ari Koivula
f6a9d237a3
Merge pull request #134 from miimiz/testink_eemeli
...
Strategyselector prints
2016-05-10 13:27:23 +03:00
Eemeli Kallio
8cfeed852c
Added print about SIMD optimizations available and in use to strategyselector.
2016-05-10 12:59:15 +03:00
Ari Koivula
f51a68b6fa
Add different sizes of search window for full search
2016-04-21 15:11:35 +03:00
Ari Lemmetti
efbdc5dade
Utilize registers more efficiently for 8x8 and larger blocks
2016-04-21 13:26:38 +03:00
Ari Lemmetti
192cee95b2
Vectorize vertical filtering
2016-04-21 13:26:38 +03:00
Ari Lemmetti
0be35f72b8
Filter 4 pixels simultaneously in x direction
2016-04-21 13:26:38 +03:00
Ari Lemmetti
10484bda9f
Make strategies out of fractional pixel sample functions
2016-04-21 13:26:38 +03:00
Ari Koivula
28e7548387
Fix bug in full mv search
...
This optimization led to some points not being searched.
2016-04-21 12:03:57 +03:00
Ari Koivula
2576aeee0b
Use merge candidates in full mv search
...
Perform a full search window around every mv candidate and the
0-vector.
2016-04-20 20:47:11 +03:00
Ari Lemmetti
8247faf8e0
Remove 64-bit only instruction to fix 32-bit compilation.
2016-04-19 18:05:11 +03:00
Ari Lemmetti
eb55d6b6b9
Fix writing over boundary.
2016-04-19 16:03:43 +03:00
Ari Lemmetti
bcabc6fadd
Remove pixel blit from strategies. Use memcpy instead.
2016-04-06 18:44:04 +03:00
Ari Lemmetti
2140197ccc
Tidy up coeff blit function and use memcpy again.
...
Give memcpy constants for fixed sizes to enable copying many bytes simultaneously.
2016-04-06 18:03:00 +03:00