hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-12-04 21:54:05 +00:00

Author	SHA1	Message	Date
Marko Viitanen	3fad4b0a98	Disable kvz_cabac_encode_aligned_bins_ep for now and add a ToDo message	2019-07-03 15:44:35 +03:00
Sami Ahovainio	ce1e67cc3a	Modified header flags to match VTM commit b9080ff45bec368c44f0c43a32dcd6804ef9f5d6	2019-07-01 13:58:15 +03:00
Sami Ahovainio	3863064d90	Fixed bugs in split decision and coefficient coding.	2019-07-01 13:00:43 +03:00
Mikko Pitkänen	a7f09c8114	Merge branch 'threadwrapper'	2019-06-24 16:54:59 +03:00
Sami Ahovainio	db5c0230e5	Fixed coefficient sign hiding	2019-06-20 16:26:01 +03:00
Sami Ahovainio	b51254cafd	Fixed significant coefficient group context calculation	2019-06-20 15:47:13 +03:00
Sami Ahovainio	5e0bea962c	Fixed split context decision	2019-06-20 15:30:49 +03:00
Sami Ahovainio	12322144f0	Removed debug print from context.c	2019-06-20 15:18:22 +03:00
Sami Ahovainio	3a9800d07d	Fixed coefficient coding. Fixed headers to match VTM commit e65075531471a68632bc9252d607655a0feeabc6	2019-06-20 14:43:03 +03:00
Mikko Pitkänen	3dd606ce2e	Add new threadwrapper	2019-06-18 18:45:45 +03:00
Sami Ahovainio	2c78aa0642	Fixes to coeff coding.	2019-06-13 12:01:29 +03:00
Joose Sainio	c94077d15e	remove hardcoded value	2019-06-12 14:37:41 +03:00
Joose Sainio	ac68c8444d	remove negation that wasn't supposed to be there	2019-06-12 14:35:24 +03:00
Joose Sainio	5851dcc3be	missing negation	2019-06-12 14:08:18 +03:00
Joose Sainio	1c36f68d0c	Fix owf>=9 gop=8 and add test to catch such problem in future	2019-06-12 14:04:41 +03:00
Sami Ahovainio	3564b4829e	Fixed split context decision. Modified intra mode initialization to match VTM version aa76fc5c04cf43390f43d63f9977bea8ee31997a.	2019-06-12 12:59:16 +03:00
Sami Ahovainio	a8a53e15b5	Fixed headers to match VTM commit aa76fc5c04cf43390f43d63f9977bea8ee31997a. Added multi_ref_line flag coding.	2019-06-07 13:37:45 +03:00
Ari Lemmetti	933ff6ed55	Merge branch 'set-qp-in-cu-fix'	2019-06-07 09:01:03 +03:00
Sami Ahovainio	8d2581e58c	Fixed issue with kvz_go_rice_par_abs where passing a unsigned argument caused MIN function to return wrong value. Modified coefficient coding to match VTM 5.0. Some issues still remain.	2019-06-05 15:57:18 +03:00
Sami Ahovainio	367f1b2129	Fixed splitting bug caused by wrong values in the headers. Fixed header flags to match VTM commit 5703e81b2de677d976ec15423f5768b17619ba6a	2019-06-05 11:21:02 +03:00
Sami Ahovainio	76d56290ed	Fixed VUI header writing. Fixed debug prints of NAL headers and rbsp_stop_one_bit.	2019-05-31 11:13:11 +03:00
Ari Lemmetti	c6da839002	Set lcu sqrt lambda according to lcu lambda instead of frame lambda when ROI is used	2019-05-29 18:32:10 +03:00
Marko Viitanen	8282a18c36	Fixed headers and NAL writing to match the latest VTM master 988c22cbb9c58584cac3ef0ec7794cafbea6dfd6	2019-05-29 16:18:35 +03:00
Sami Ahovainio	4768ba0628	Minor fixes to header writing. Added contexts for multi_ref_line and BDPCM. Functions added for writing both in bitstream, but they are both disabled for now.	2019-05-29 13:00:19 +03:00
Sami Ahovainio	3339e12169	Fixed some header flags	2019-05-27 09:56:56 +03:00
Ari Lemmetti	9339845e8b	Set QP completely at CU level as the name '--set-qp-in-cu' implies -Move slice delta QP to CU level when using --set-qp-in-cu -Separate functionality from roi	2019-05-24 20:38:39 +03:00
Pauli Oikkonen	081d16fc33	Fix intrinsics that may be missing on some systems Create a header to collect all the workarounds for missing intrinsics in one place	2019-05-23 19:59:40 +03:00
Sami Ahovainio	5b46fbd878	Added multi_ref_idx variable for intra coding (is 0 throughout the code for now). Modified prediction flag writing. Chroma pred flag remains unchanged (ToDo). Added bitstream debug printing on VERBOSE mode.	2019-05-21 12:28:05 +03:00
Sami Ahovainio	ed4e218702	Updated coefficient coding to match VTM 5.0	2019-05-13 15:30:43 +03:00
Sami Ahovainio	504c3dfd1b	Modified the headers to match current VTM headers	2019-05-07 16:30:06 +03:00
Marko Viitanen	30a8a7b97c	WIP fixing the last significant xy coding	2019-05-07 15:01:02 +03:00
Pauli Oikkonen	87a9208db8	Eliminate cvtsi64_si128 intrinsic Apparently it'll cause Win32 builds to break because it emits the movq instruction or something..	2019-04-17 16:30:40 +03:00
Pauli Oikkonen	7175d20bb2	Still include stdint.h for non-vector builds	2019-04-15 19:36:01 +03:00
Pauli Oikkonen	1315c7e2b0	Do not compile any vector code for non-SSE4/AVX2 builds	2019-04-15 19:10:48 +03:00
Pauli Oikkonen	f5f70e7bc5	Merge branch 'sad-optimization'	2019-04-15 19:02:01 +03:00
Jan Beich	85f46e17a9	Detect AltiVec via elf_aux_info() on FreeBSD 12+	2019-04-01 13:08:04 +00:00
Jan Beich	82486255da	Simplify AltiVec detection on Linux	2019-04-01 13:08:04 +00:00
Marko Viitanen	1546acfdb9	New NAL unit IDs and header changes	2019-03-28 10:11:36 +02:00
Marko Viitanen	36eab9c170	New cabac context models with "rate"	2019-03-27 12:38:19 +02:00
Marko Viitanen	3bdc8ac8d3	Fix intra_chroma_pred_mode and cbf contexts	2019-03-26 09:10:09 +02:00
Marko Viitanen	d15f58517f	Changed intra coding to use 6 MPM, implemented merge sort and MPM selection	2019-03-20 15:20:31 +02:00
Marko Viitanen	1081336868	Updated intra pred mode init values	2019-03-20 15:18:32 +02:00
Marko Viitanen	f3acd245ae	New cabac coding function: kvz_cabac_encode_trunc_bin	2019-03-20 15:17:54 +02:00
Marko Viitanen	80d6e4bf05	New split flag calculations	2019-03-20 09:07:58 +02:00
Marko Viitanen	8c84348010	New entropy bit table	2019-03-20 09:07:22 +02:00
Marko Viitanen	2d0348aa6d	New context models	2019-03-20 09:06:57 +02:00
Marko Viitanen	052080747e	New CABAC functions	2019-03-20 09:06:26 +02:00
Marko Viitanen	20667fdba6	Update header bits to VTM 4.0+	2019-03-11 14:02:12 +02:00
Pauli Oikkonen	6d43759604	Create a border-respecting 32-wide AVX hor_sad	2019-03-07 18:01:22 +02:00
Pauli Oikkonen	f218cecb38	Remove offending hor_sad_avx2_w32 function Consider possibly creating a non-offending AVX2 version instead, the way hor_sad_sse41_w32 works. Or maybe there's more essential work to do.	2019-03-05 22:51:41 +02:00
Pauli Oikkonen	df2e6c54fd	4-unroll hor_sad_sse41_arbitrary This may not increase perf though because it's so rarely used function, so keeping icache footprint may be more essential...	2019-03-05 22:45:23 +02:00
Pauli Oikkonen	448eacba7b	Avoid overreading block borders in hor_sad_sse41_arbitrary	2019-03-05 22:34:50 +02:00
Eemeli Kallio	c159e275b7	Merge branch 'max_merge'	2019-03-05 14:39:03 +02:00
Pauli Oikkonen	41f51c08c4	Avoid overrunning buffer in hor_sad_sse41_w32	2019-03-01 15:37:38 +02:00
Pauli Oikkonen	bcd9879359	Include quant coeff range check in non-scaling list execution path too	2019-02-27 17:26:44 +02:00
Pauli Oikkonen	24e6363f64	Remove the kvz_quant_avx2 wrapper function	2019-02-27 16:32:58 +02:00
Pauli Oikkonen	748820f3c5	Eliminate unnecessary loading of coeffs if scaling lists are off	2019-02-27 16:26:35 +02:00
Pauli Oikkonen	5994350f40	Allow quant_flat_avx2 to be used with scaling lists on	2019-02-27 16:25:59 +02:00
Eemeli Kallio	7f4e0acf41	Added check if max-merge is out of bounds	2019-02-19 13:53:42 +02:00
Pauli Oikkonen	9b0e079262	Use SSE instructions for 64-bit SADs instead of MMX VC++ seems to choke on MMX instructions	2019-02-18 20:13:33 +02:00
Pauli Oikkonen	d8b8923028	Add LGPL notices to reg_sad headers	2019-02-18 17:52:47 +02:00
Eemeli Kallio	2a40560888	some variables to const	2019-02-12 11:24:10 +02:00
Eemeli Kallio	8f8e7bb53c	Added possibility to reduce number of maximum number of merge candidates.	2019-02-12 09:21:03 +02:00
Marko Viitanen	1165219842	Update PTL, SPS ext and SPS flags to match VTM 4rc1	2019-02-07 10:00:04 +02:00
Pauli Oikkonen	770db825b9	Create hor_sad_w8 and w4 epol mask the way w16 works	2019-02-06 19:34:26 +02:00
Pauli Oikkonen	aa19bcac8a	Avoid branching in creating shuffle mask in hor_sad_w16	2019-02-06 18:58:46 +02:00
Pauli Oikkonen	2d05ca8520	Remove width from constant-width hor_sad func params They should kinda know it already	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	57db234d95	Move 32-wide SSE4.1 hor_sad to picture-sse41.c It's not used by picture-avx2.c that also includes the header, so it should not be in the header	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	dd7d989a39	Implement 32-wide hor_sad on AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ff70c8a5ec	Utilize horizontal SAD functions for SSE4.1 as well	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f5ff4db01f	4-wide hor_sad border agnostic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	35e7f9a700	Fix hor_sad w8 to work with both borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	836783dd6e	Use hor_sad_w32 for both left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	69687c8d24	Modify hor_sad_sse41_w16 to work over left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	51c2abe99a	Modify image_interpolated_sad to use kvz_hor_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	1e0eb1af30	Add generic strategy for hor_sad'ing an non-split width block	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	686fb2c957	Unroll arbitrary-width SSE4.1 hor_sad by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	768203a2de	First version of arbitrary-width SSE4.1 hor_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ccf683b9b6	Start work on left and right border aware hor_sad Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point investigate if this can start to thrash icache	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	760bd0397d	Pad the image buffer by 64 bytes from both ends This will be necessary for an efficient and straightforward implementation of hor_sad for blocks over 16 pixels wide, because they cannot use the shuffle trick because inter-lane shuffling is so hard to do	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	c36482a11a	Fix bug in 24-wide SAD facepalm	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f781dc31f0	Create strategy for ver_sad Easy to vectorize	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ca94ae9529	Handle extrapolated blocks with unmodified width using optimized_sad pointer	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91b30c7064	Tidy up kvz_image_calc_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	9db0a1bcda	Create get_optimized_sad func for SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91380729b1	Add generic get_optimized_sad implementation NOTE: To force generic SAD implementation on devices supporting vectorized variants, you now have to override both get_optimized_sad and reg_sad to generic (only overriding get_optimized_sad on AVX2 hardware would just run all SAD blocks through reg_sad_avx2). Let's see if there's a more sensible way to do it, but it's not trivial.	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	45f36645a6	Move choosing of tailored SAD function higher up the calling chain	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91cb0fbd45	Create strategy for directly obtaining pointer to constant-width SAD function	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	94035be342	Unify unrolling naming conventions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	517a4338f6	Unroll SSE SAD for 8-wide blocks to process 4 lines at once	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	0f665b28f6	Unroll arbitrary width SSE4.1 SAD by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	cbca3347b5	Unroll 64-wide AVX2 SAD by 2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	84cf771dea	Unroll 32 and 16 wide SAD vector implementations by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	5df5c5f8a4	Cast all pointers to const types in vector SAD funcs Also tidy up the pointer arithmetic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a711ce3df5	Inline fixed width vectorized SAD functions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	6504145cce	Remove 16-pixel wide AVX2 SAD implementation At least on Skylake, it's noticeably slower than the very simple version using SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4cb371184b	Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	796568d9cc	Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4d45d828fa	Use constant-width SSE4.1 SAD funcs for AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	2eaa7bc9d2	Move SSE4.1 SAD functions to separate header	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	d2db0086e1	Create constant width SAD versions for 8 and 16 pixels	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a13fc51003	Include a blank AVX2 strategy registration function even in non-AVX2 builds	2019-02-04 19:52:24 +02:00
Pauli Oikkonen	d55414db66	Only build AVX2 coeff encoding when supported ..whoops	2019-02-04 19:34:30 +02:00
Pauli Oikkonen	3fe2f29456	Merge branch 'encode-coeffs-avx2'	2019-02-04 18:52:31 +02:00
Pauli Oikkonen	722b738888	Fix more naming issues	2019-02-04 16:05:43 +02:00
Pauli Oikkonen	e26d98fb75	Rename a couple variables and add crucial comments	2019-02-04 15:57:07 +02:00
Pauli Oikkonen	f186455619	Move encode_last_significant_xy out of strategy modules It's the exact same in both AVX2 and generic, and does not seem to be worth even trying to vectorize	2019-02-04 14:55:41 +02:00
Pauli Oikkonen	3f7340c932	Fine-tune pack_16x16b_to_16x2b Avoid mm_set1 operation when it's possible to create the constant with one bit-shift operation from another instead. Thanks Intel for 3-operand instruction encoding!	2019-02-04 14:44:47 +02:00
Pauli Oikkonen	314f5b0e1f	Rename 16x2b cmpgt function, comment it better, optimize it slightly Eliminate an unnecessary bit masking to make it even more messy	2019-02-04 14:44:32 +02:00
Pauli Oikkonen	d8ff6a6459	Fix _andn_u32 to work on old Visual Studio	2019-02-01 15:34:42 +02:00
Pauli Oikkonen	26e1b2c783	Use (u)int32_t instead of (unsigned) int in reg_sad_sse41	2019-01-10 14:37:04 +02:00
Pauli Oikkonen	3a1f2eb752	Prefer SSE4.1 implementation of SAD over AVX2 It seems that the 128-bit wide version consistently outperforms the 256-bit one	2019-01-10 13:48:55 +02:00
Pauli Oikkonen	9b24d81c6a	Use SSE instead of AVX for small widths Highly dubious if this will help performance at all	2019-01-07 20:12:13 +02:00
Pauli Oikkonen	b2176bf72a	Optimize SSE4.1 version of SAD Make it use the same vblend trick as AVX2. Interestingly, on my test setup this seems to be faster than the same code using 256-bit AVX vectors.	2019-01-07 19:40:57 +02:00
Pauli Oikkonen	887d7700a8	Modify AVX2 SAD to mask data by byte granularity in AVX registers Avoids using any SAD calculations narrower than 256 bits, and simplifies the code. Also improves execution speed	2019-01-07 18:53:15 +02:00
Pauli Oikkonen	7585f79a71	AVX2-ize SAD calculation Performance is no better than SSE though	2019-01-07 16:26:24 +02:00
Pauli Oikkonen	ab3dc58df6	Copy SAD SSE4.1 impl to AVX2	2019-01-03 18:31:57 +02:00
Pauli Oikkonen	45ac6e6d03	Tidy pack_16x16b_to_16x2b comments	2019-01-03 16:37:05 +02:00
Ari Lemmetti	cd818db724	Add missing quantization and residual in cost calculation (inter rd=2).	2018-12-21 15:55:29 +02:00
Pauli Oikkonen	016eb014ad	Move packing 16x16b -> 16x2b into separate function	2018-12-20 10:51:44 +02:00
Ari Lemmetti	b234897e8a	Fix smp and amp blocks in fme and revert previous change. Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc. Calculate SATD on the 8x4, ... part	2018-12-19 21:30:53 +02:00
Pauli Oikkonen	9aaa6f260d	Fixes to enable portability	2018-12-18 20:42:09 +02:00
Pauli Oikkonen	2fdbbe9730	Move CG reordering code from quant-avx2 to shared header	2018-12-18 19:42:18 +02:00
Pauli Oikkonen	d02207306d	Create a header file for shared AVX2 code	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	361bf0c7db	Precompute >=2 coeff encoding loop with 2-bit arithmetic Who needs 16x16b vectors when you can do practically the same with 16x2b pseudovectors in 32-bit general purpose registers!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	940b0e9e6a	Require BMI2 for AVX2 build Any processor implementing AVX2 should also implement BMI2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	f66cb23d5b	Optimize greater1 encoding loop Calculating the c1 variable need not be a serial operation!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	8c8b791c35	Vectorize kvz_context_get_sig_ctx_inc	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	033261eb74	Eliminate two branches using bit magic	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c4434e8d04	Scan CG's in forward order to simplify finding last significant	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	efd097f5a5	Vectorize the coeff group loop to some extent	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	a01362e638	use the efficient method of reordering raster->scan	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	50a888e789	Use the efficient method to find first and last nz coeffs in block	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	7e9203f566	Scan coeff groups in scan order to help find last significant one	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	9a5a6fdbc7	Simplify two ifs in encode_coeff_nxn-avx2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	37a2a8bac8	See if loop can be optimized by rearranging	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	584f2f74b6	Vectorize significant coeff group scanning loop	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	1bfed73221	Add AVX2 strategy for encode_coding_tree	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c3a6f3112a	Add generic strategy group for encode_coding_tree	2018-12-18 19:41:09 +02:00
Marko Viitanen	1ef851ab4b	Disable FME on amp/smp blocks with width or height not divisible by 8	2018-12-18 10:28:21 +02:00
Joose Sainio	b71c5573f0	Merge branch 'rate_control_fix'	2018-12-17 12:39:27 +02:00
Sergei Trofimovich	68a70e45a1	x86 asm: mark stack as non-executable Gentoo's `scanelf` QA tool detects writable/executable stack of assembly-writtent files as: ``` $ scanelf -qRa . 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-satd.o ``` Normally C compiler emits non-executable stack marking (or GNU assembler via `-Wa,--noexecstack`). The change adds non-executable stack marking for yasm-based assmbly files. https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>	2018-12-16 11:31:56 +00:00
Reima Hyvönen	1fcc5c6a8d	Merge branch 'bipred_recon'	2018-12-11 09:59:35 +02:00
Reima Hyvönen	e4a10880f3	Added case 12 to bipred_recon no mov	2018-12-11 09:52:17 +02:00
Marko Viitanen	a4f3968e52	Fix Visual Studio errors by initializing some variables used in AVX2 signhiding	2018-12-11 09:33:26 +02:00
Ari Lemmetti	ac943147e3	Calculate satd cost for whole non-square blocks as well.	2018-12-10 17:04:29 +02:00
Pauli Oikkonen	c465578048	Add a descriptive comment to coefficient reordering	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	f78bf2ebcb	Optimize q_coefs usage for indexed fetch	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	d9591f1b49	Eliminate midway buffering of reordered coefs TODO: For some mysterious reason seems slightly slower than the buffered one	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7fe454c51f	Optimize get_cheapest_alternative()	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	6bbd3e5a44	Optimize rearrange_512 function	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	cb8209d1b3	Vectorize transform coefficient reordering loop	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7cf4c7ae5f	Rename "reduce" functions to hsum That's what the functions fundamendally do anyway	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	316cd8a846	Fix ALIGNED keyword and grow alignment to 64B	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	1befc69a4c	Implement sign bit hiding in AVX2	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	c5cd03497e	Require BMI and ABM instruction sets for AVX2 build AVX2 support on a processor should always imply BMI and ABM support. The lzcnt and tzcnt instructions have more suitable semantics in the corner case that source word is 0, and allow us to even handle that scenario without a branch. Apparently Visual Studio will already include this support when building with AVX2 enabled, so only the automake files need to be tweaked.	2018-12-03 15:36:32 +02:00
Reima Hyvönen	f8696b54a4	Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)	2018-11-20 17:09:19 +02:00
Marko Viitanen	a5a10a33c3	Enable --scaling-list parameter and add to the documentation	2018-11-19 10:47:30 +02:00
Reima Hyvönen	710ba288db	Chroma has some problems	2018-11-15 16:42:48 +02:00
Sami Ahovainio	8f98d4aac7	Added square search	2018-11-14 14:50:31 +02:00
Marko Viitanen	6871490dd5	Simplify get_mvd_coding_cost(), only include golomb coding	2018-11-14 14:33:31 +02:00
Ari Lemmetti	a832206bb6	Replace 32-bit incompatible instrinsics	2018-11-12 18:54:33 +02:00
Ari Lemmetti	5c774c4105	Rewrite most of FME and interpolation filters Changes had to break a lot of stuff and were just squashed into this horrible code dump	2018-11-08 20:21:16 +02:00
Joose Sainio	1c8a1f24e2	Don't assume anything about bits spent	2018-11-07 16:03:38 +02:00
Joose Sainio	3471e2470d	Fix using uninitialized value for the first frame	2018-11-07 08:17:39 +02:00
Joose Sainio	d95ac11a3b	Fix rate_control for other LP-GOPS	2018-11-06 14:20:44 +02:00
Joose Sainio	67a6ba667e	Fix rate control for flat lp-gop	2018-11-06 09:38:17 +02:00
Reima Hyvönen	7406c33a42	Some more cleaning	2018-10-26 12:25:18 +03:00
Reima Hyvönen	4c71546b2e	Cleaned some coding	2018-10-26 12:19:44 +03:00
Reima Hyvönen	4fe3909e48	Switched luma to use 32bits size ints intstead of 16bit size	2018-10-24 18:24:46 +03:00
Marko Viitanen	465bc2cfee	[EMT] make functions static and prefix arrays with kvz_g	2018-10-18 10:54:33 +03:00
Marko Viitanen	b133e7de1e	VTM 2.2 changed -> remove high_precision_motion_vectors flag	2018-10-17 12:41:14 +03:00
Marko Viitanen	169febd1c4	[EMT] Simplify DCT8, DCT5, DST1 and DST7 definitions	2018-10-17 12:17:54 +03:00
Marko Viitanen	e015d7eb2b	Fix compiler warnings	2018-10-17 10:43:11 +03:00
Marko Viitanen	ad310c77d3	Added EMT transforms to the strategies	2018-10-17 08:56:49 +03:00
Eemeli Kallio	284e73839e	Calculating zero cost moved to its own function	2018-10-16 11:02:01 +03:00
Reima Hyvönen	381e786e10	Trying to find the bug in luma	2018-10-11 18:08:41 +03:00
Marko Viitanen	c589e5ed36	Fix closed-gop frame feed, the ordering was incorrect after the first GOP	2018-10-10 11:12:03 +03:00
Reima Hyvönen	2f5f81bac3	removed the non-optimated bipred function	2018-10-09 11:19:23 +03:00
Marko Viitanen	75dce4f3ce	Fix low-delay-gop usage with --no-open-gop	2018-10-04 15:16:02 +03:00
Marko Viitanen	de71b58f76	Change closed GOP structure to include an additional IDR between GOPs	2018-10-04 11:17:03 +03:00
Marko Viitanen	1e1a80e4a6	[TMVP] fix clamping of block offsets and clean up the code a bit	2018-10-03 12:34:48 +03:00
Reima Hyvönen	212a8e68fa	Modified to avoid memory overflow, still some bug inside luma	2018-10-02 20:23:32 +03:00
Marko Viitanen	954f07e3d7	Add --(no-)open-gop option	2018-10-02 10:05:32 +03:00
Marko Viitanen	027359c3c3	Implement TMVP duplicate checking as in VTM 2.1	2018-09-28 11:50:36 +03:00
Marko Viitanen	571a545416	Fix spatial merge candidate selection	2018-09-26 15:10:31 +03:00
Marko Viitanen	63760ca0cf	Use kvz_cabac_bins_verbose flag to control cabac debug printing	2018-09-26 12:01:23 +03:00
Marko Viitanen	7c37f456f9	Fix implicit Qt split for p-frames	2018-09-26 12:00:18 +03:00
Marko Viitanen	b6f2c66c73	Fixed intra Most Probable Mode (mpm) derivation to conform VTM 2.1	2018-09-21 10:33:54 +03:00
Sami Ahovainio	a2b2275d87	Fixed array sizes in search_intra_rough from 35 to 67	2018-09-18 11:49:15 +03:00
Sami Ahovainio	82fb80ab6e	Fixed couple of if-clauses which still used the old intra mode range.	2018-09-17 08:56:43 +03:00
Marko Viitanen	a437d4c508	Fixed intra chroma mode bitstream writing (chroma search not used)	2018-09-13 15:05:00 +03:00
Marko Viitanen	389aeebe07	Added 2x2 transform functions	2018-09-13 14:51:07 +03:00
Marko Viitanen	445c059b4a	Fix transforms for VTM 2.0, generated new transform matrices and added a shift by 2 for forward and inverse	2018-09-13 14:39:49 +03:00
Marko Viitanen	35fa8e9785	Fix kvz_intra_get_dir_luma_predictor -> Intra working	2018-09-13 12:32:17 +03:00
Marko Viitanen	f75b0b11c3	Simplify intra filtered ref pixel selection	2018-09-13 10:09:52 +03:00
Sami Ahovainio	4bb484a86a	Fixed if-clause at search_intra.c to use new wider range of intra modes	2018-09-13 09:58:48 +03:00
Marko Viitanen	82de0fbee7	Switch intra search to use the actual 67 modes	2018-09-13 09:43:45 +03:00
Marko Viitanen	382917bcd3	New table for choosing angular intra filtered references and a small bugfix on the end condition of angular intra	2018-09-13 09:35:55 +03:00
Marko Viitanen	4aad2fa383	Fix intra mode writing	2018-09-12 10:34:58 +03:00
Marko Viitanen	d4ed0ee3ad	Fixed some array offsets in intra angular prediction	2018-09-12 08:53:17 +03:00
Marko Viitanen	20c96366ed	fix kvz_context_get_sig_ctx_idx_abs() parameter for "type" -> decoding with VVC	2018-09-10 12:51:02 +03:00
Marko Viitanen	a7ca09108c	Improve CABAC debugging by including similar info as in VTM	2018-09-10 11:00:00 +03:00
Sami Ahovainio	ce84407c69	Fixed coeff_remain writing to use the correct rice_param instead of using 0 all the time.	2018-09-07 11:24:24 +03:00
Sami Ahovainio	78ea24bcf1	Fixed sig_coeff_flag writing condition.	2018-09-06 15:48:45 +03:00
Marko Viitanen	4bebb4bb2c	Fix temp_diag and temp_sum initialization and coeff array usage in context derivation	2018-09-05 17:09:50 +03:00
Marko Viitanen	f5b6c386bc	Fix incorrect sig_flag implicity parameters and some temp variable initializations	2018-09-03 16:22:05 +03:00
Marko Viitanen	8bef85e056	Merge branch 'set-qp-in-cu'	2018-09-03 08:33:33 +03:00
Ari Lemmetti	2fdcc2b79d	Add option --set-qp-in-cu	2018-09-03 08:32:45 +03:00
Marko Viitanen	52be2f0bbe	Fixed kvz_encode_coeff_nxn and renamed some variables to match VTM	2018-08-31 15:10:17 +03:00
Sami Ahovainio	787264f568	Fixed dst indexing in kvz_angular_pred_generic	2018-08-31 10:36:28 +03:00
Sami Ahovainio	d2291fea83	Intra mode scaling moved from angular prediction to kvz_intra_predict. pdpc implemented in kvz_intra_predict.	2018-08-31 10:01:28 +03:00
Marko Viitanen	49a116ed3a	Bugfix correct array sizes for cu_ctx_last_x/y	2018-08-30 16:14:08 +03:00
Sami Ahovainio	84cef127dc	Fixed cu_gtx_flag_model_chroma initialization.	2018-08-30 15:21:16 +03:00
Marko Viitanen	7d491e639b	Add new values to last_x/y coding	2018-08-30 15:04:04 +03:00
Marko Viitanen	809805b185	Bugfixes for kvz_encode_coeff_nxn()	2018-08-30 14:50:29 +03:00
Marko Viitanen	0680f240d7	Converted kvz_encode_coeff_nxn and related helper functions to VVC K0072 format	2018-08-30 14:24:03 +03:00
Marko Viitanen	84e78c6c50	Disable writing of cabac flags not currently available	2018-08-30 11:21:44 +03:00
Marko Viitanen	e3dbaf99a9	Started implementing new coeff coding function - added kvz_context_get_sig_ctx_idx_abs for abs sig context derivation	2018-08-30 11:09:42 +03:00
Marko Viitanen	e00319b832	Fix cu_sig_coeff_group_model init and some instances of cu_sig_model usage	2018-08-30 09:08:08 +03:00
Marko Viitanen	4429e0b89d	Expand cu_sig_coeff_group_model according to VVC	2018-08-29 16:20:34 +03:00
Sami Ahovainio	578122ed43	Context changes for chroma pred modes. BT flag init and chroma pred mode init moved inside a loop.	2018-08-29 16:00:08 +03:00
Sami Ahovainio	54ebadfc43	Clarifying comments and changes towards WAIP	2018-08-29 16:00:08 +03:00
Marko Viitanen	7f119e8bdd	Added new ctx models for sig, parity and gtx, removed models for one and abs	2018-08-29 15:57:40 +03:00
Marko Viitanen	46d02c1734	Implemented JVET-K0072 based cbf context selections	2018-08-29 10:12:07 +03:00
Marko Viitanen	bb9dc22336	Disable PCM	2018-08-29 09:59:53 +03:00
Marko Viitanen	23a1292f52	Added max_binary_tree_unit_size and more comments	2018-08-29 08:23:41 +03:00
Marko Viitanen	37caa451c6	Fix VVC split flag condition for hor and ver splits at the edges - Split flag is no longer implicit when the block can be split with the BT after QT in horizontal or vertical way	2018-08-28 16:03:02 +03:00
Reima Hyvönen	896034b7cf	Some renamed functions back	2018-08-28 15:31:10 +03:00
Reima Hyvönen	e8b5e6db4c	Did some merging	2018-08-28 15:26:27 +03:00
Reima Hyvönen	7de5c74434	Updated bipred_recon to work faster	2018-08-28 15:12:31 +03:00
Reima Hyvönen	47b357cca2	Comment one test	2018-08-27 18:52:14 +03:00
Reima Hyvönen	2ca99a44e8	Updated shuffle operation to be in right order	2018-08-27 18:16:38 +03:00
Sami Ahovainio	42741a2c40	Some changes for PCM and Intra towards VTM 2.0 compatibility.	2018-08-27 09:18:15 +03:00
Marko Viitanen	3dc5f65fba	Add an extra bit to intra mode and map 33 angular modes to 65	2018-08-17 15:09:48 +03:00
Marko Viitanen	9aaf53fcd7	Add dep_quant_enable_flag to slice header	2018-08-17 14:58:57 +03:00
Marko Viitanen	dc92fa6fb3	Added missing ALF flag to SPS	2018-08-17 12:53:27 +03:00
Marko Viitanen	dbc74c592d	Add VTM 2.0 new flags to SPS	2018-08-17 12:47:29 +03:00
Marko Viitanen	17505c8306	Disable vertical and horizontal scan order with small blocks - Intra now working down to 8x8 luma	2018-08-17 11:38:40 +03:00
Marko Viitanen	4f7da86285	Commented out sign hiding code, which is not used in VVC	2018-08-17 09:38:11 +03:00
Marko Viitanen	c9cbdd5dc3	Added couple of ToDo comments for large CTU support	2018-08-17 09:37:14 +03:00
Marko Viitanen	daf041406f	Disable DST	2018-08-16 16:05:32 +03:00
Marko Viitanen	b85ae3688e	Signal QP in slice header if tiles and slices=tiles are enabled Keeps the PPS constant for various purposes	2018-08-16 08:44:39 +03:00
Sami Ahovainio	5baab86597	Added BT split flags	2018-08-14 15:28:06 +03:00
Marko Viitanen	b33aa37484	Enable max_trans_hier_depth values and disable DC and angular filtering	2018-08-14 15:24:21 +03:00
Marko Viitanen	00a827007a	Use normal split flags	2018-08-14 10:57:32 +03:00
Reima Hyvönen	508b218a12	some modifications made to prevent reading too much	2018-08-14 10:50:39 +03:00
Reima Hyvönen	1d935ee888	some useless stuff removed	2018-08-13 16:47:11 +03:00
Reima Hyvönen	ce3ac4c05e	some modifications to no_mov	2018-08-13 16:41:02 +03:00
Reima Hyvönen	15a613ae94	test if no_mov breaks testing	2018-08-13 16:02:56 +03:00
Reima Hyvönen	97a2049e58	removed pointer declaration out from switch	2018-08-10 16:42:26 +03:00
Reima Hyvönen	aa94bcedbc	Stream is now pointer	2018-08-10 16:38:49 +03:00
Reima Hyvönen	fa5b227ece	256 to 32 doesn't work, made them by hand	2018-08-10 16:01:20 +03:00
Reima Hyvönen	408dedbcc8	removed _mm256_extract_epi8 and replaced with _mm_stream	2018-08-10 15:53:26 +03:00
Reima Hyvönen	31c35091c6	_mm256_cvtsi256_si32 removed	2018-08-10 10:06:40 +03:00
Reima Hyvönen	99dc43074f	_mm256_cvtsi256_si32 breaks system, too much bits. back to extract	2018-08-10 09:59:33 +03:00
Reima Hyvönen	4f1f80b2cb	Transformed convert from 256 to cast 256 -> 128 and then convert from 128	2018-08-09 15:35:54 +03:00
Reima Hyvönen	4957555eb3	Removed leftover from 939	2018-08-09 15:25:03 +03:00
Reima Hyvönen	28b165c971	Clearified some sections, added _MM_SHUFFLE macro	2018-08-09 15:23:01 +03:00
Reima Hyvönen	dd04df8667	testing if error in both avx2 functions	2018-08-03 11:49:00 +03:00
Reima Hyvönen	ed50d71fde	Switched some variables to different location, altered inter_recon_bipred_avx2 function	2018-08-02 16:08:59 +03:00
Reima Hyvönen	f5739a0028	Renaming and removing useless prints	2018-08-02 14:47:17 +03:00
Reima Hyvönen	bc09f59bb6	Edited some definitions	2018-08-02 11:54:53 +03:00
Marko Viitanen	ffbc178cf9	An attempt to fix checksums	2018-07-27 14:38:05 +03:00
Marko Viitanen	84b6a61193	Hack to fix split flag model for PCM use -> valid VVC bitstream	2018-07-27 14:29:31 +03:00
Marko Viitanen	90174f1143	Add more values to cabac debugging	2018-07-27 13:59:54 +03:00
Marko Viitanen	c6572d644f	Updated split_flag initialization to support Large CTUs in VVC	2018-07-27 12:32:45 +03:00
Marko Viitanen	7abadaafe4	Disable CTU splitting and configure max CTU sizes to 64x64	2018-07-27 11:04:21 +03:00
Marko Viitanen	6921e31502	Fix debugging functions	2018-07-27 11:03:16 +03:00
Marko Viitanen	37b5ce3d33	Change configurations to ease VVC debugging, max-BT-depth = 0	2018-07-26 16:12:11 +03:00
Marko Viitanen	792da1b7e0	Force PCM coding and fix PCM sample output	2018-07-26 11:05:31 +03:00
Marko Viitanen	5d4a2a004f	Remove depentent slice, wpp/tile and scaling list parameters from PPS	2018-07-26 10:43:21 +03:00
Marko Viitanen	31a6cbfe6d	Disable sign bit hiding	2018-07-26 10:41:35 +03:00
Marko Viitanen	9f2b429c66	Disable some features not used in VVC - Part mode coding not used - split transform flag not used - last significant coeff pos swapping not used	2018-07-26 10:33:27 +03:00
Marko Viitanen	e84276f7f6	Fixed version string	2018-07-26 08:17:55 +03:00
Marko Viitanen	e38109d102	Enable QTBT and set correct general_profile_idc for Next	2018-07-25 12:24:17 +03:00
Marko Viitanen	079ca9b8b2	Disable tile/wpp flags in slice header	2018-07-25 11:19:53 +03:00
Marko Viitanen	b0ac7002e5	Disable VPS	2018-07-25 11:02:09 +03:00
Marko Viitanen	c5bf6a3774	Bugfix: add missing parameters to WRITE_U	2018-07-25 10:18:48 +03:00
Marko Viitanen	9befe35961	Modify slice header to conform VVC	2018-07-25 10:17:42 +03:00
Marko Viitanen	95ce1e1a25	Modify parameter sets to conform VVC	2018-07-25 10:05:11 +03:00
Arttu Ylä-Outinen	83555c3d6d	Enable --fast-residual-cost with fastest presets	2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen	c438bb4a19	Add an option to skip CABAC for residual costs Adds command line option --fast-residual-cost=<limit>. When QP is below the limit, estimates the cost of coding the residual coefficients from the sum of absolute coefficients. Skipping CABAC is not worth it with high QPs because there are fewer coefficients so CABAC is not as slow.	2018-07-16 12:31:20 +03:00
Reima Hyvönen	a4bf77f208	Tested some extract functions	2018-07-12 09:29:32 +03:00
Reima Hyvönen	c05033a893	Even more useless vectors removed	2018-07-11 15:09:14 +03:00
Reima Hyvönen	884cb77238	Removed some not used vectors	2018-07-11 15:06:11 +03:00
Reima Hyvönen	792689a5ff	Removed for-loops, added extract instead	2018-07-11 14:56:41 +03:00
Reima Hyvönen	f9c7f6ee66	Added some break-operations for avx2 optimation	2018-07-11 14:15:38 +03:00
Reima Hyvönen	cc064da143	some more optimation for bipred	2018-07-11 11:27:54 +03:00
Reima Hyvönen	9a339eef89	Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD # Conflicts: # build/kvazaar_lib/kvazaar_lib.vcxproj	2018-07-10 16:21:04 +03:00
Reima Hyvönen	a22cf03ddb	Updated to have no movement function to avx2 strategies	2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen	b7474eb532	Fix SAO buffer sizes Increases sizes of buffers used for SAO reconstruction to avoid stack buffer overflow in AVX2 SAO reconstruction.	2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen	b37470e80f	Merge pull request #207 from jbeich/maltivec Unbreak build on PowerPC if AltiVec isn't supported	2018-07-04 11:06:41 +03:00
Reima Hyvönen	ea83ae45f0	Toimiva ratkaisu	2018-07-03 11:18:51 +03:00
Jan Beich	4f4bea7496	Check -maltivec is supported before using PowerPC target may lack or have non-standard FPU: $ cc -dumpmachine powerpcspe-undermydesk-freebsd $ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist	2018-07-02 23:25:23 +00:00
Jan Beich	b892d820f8	Clean up macOS includes on powerpc* after `93e1c9f1c3` strategyselector.c:426:25: machine/cpu.h: No such file or directory	2018-07-02 21:52:45 +00:00
Reima Hyvönen	17babfffa4	25.6 working optimation, ~50% faster than original	2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen	2f995f4325	Merge pull request #205 from jbeich/powerpc Unbreak build on non-Linux powerpc*	2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen	c1398ef818	Permit --period=1 with any GOP structure All intra coding is a special case so it can be permitted even though Kvazaar normally only supports intra periods that are divisible by the GOP length.	2018-06-18 12:26:11 +03:00
Arttu Ylä-Outinen	abdebe0bf9	Fix --owf help message The number of parallel frames is --owf plus one, not --owf minus one. Fixes #204.	2018-06-18 09:33:36 +03:00
Jan Beich	93e1c9f1c3	Add AltiVec detection for BSDs strategyselector.c:377:26: linux/auxvec.h: No such file or directory	2018-06-17 15:38:24 +00:00
Miika Metsoila	98972d26c2	Document that the high tier requires level 4 or higher	2018-06-14 12:41:03 +03:00
Miika Metsoila	62b44efaa4	Write the encoding tier (main/high) into the bitstream	2018-06-14 12:41:03 +03:00
Arttu Ylä-Outinen	a343f6d587	Prepare for delta QPs at CU-level - Replaces lcu_dqp_enabled with max_qp_delta_depth in encoder_control_t. - Fixes set_cu_qps so that it can handle quantization groups of arbitrary size. - Fixes computation of QP predictors so that it works for quantization groups of arbitrary size.	2018-06-13 15:36:19 +03:00
Arttu Ylä-Outinen	dc6b2024ea	Modify reference count asserts to fix data races Changes asserts on the reference count of objects to assert the value after KVZ_ATOMIC_INC instead of directly checking the value. Fixes some data races detected by TSan.	2018-06-12 09:35:07 +03:00
Ari Lemmetti	4fb1c16c61	Add early termination for intra rdo when a zero coefficient block is found.	2018-06-08 21:03:07 +03:00
Ari Lemmetti	492529fb7a	Add the same comment to help message as well...	2018-05-30 14:13:15 +03:00
Ari Lemmetti	0d5972bf03	Add missing sort to intra transform split search so mode at 0 is the best	2018-05-21 13:10:38 +03:00
Sebastien Alaiwan	954bca7d6e	Fix memset parameter	2018-05-17 11:24:49 +02:00
Jaakko Laitinen	f9466efcbb	Close file on error	2018-05-15 11:50:16 +03:00
Reima Hyvönen	9fed29f950	optimation for inter_recon_bipred	2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen	5c585c4fbc	Update help message Updates the default option values to match the medium preset.	2018-04-03 10:40:37 +03:00
Arttu Ylä-Outinen	2b4e22111a	Update presets The new presets are slower but have better coding efficiency.	2018-04-03 10:37:30 +03:00
Arttu Ylä-Outinen	7185519a1b	Update command line help - Adds missing default values. - Adds help for --crypto and --key. - Adds help for --rd=3. - Adds help for --sao options. - Some changes to help wording.	2018-03-23 14:33:04 +02:00
Arttu Ylä-Outinen	3606860504	Add --no-cpuid option Equivalent to --cpuid=0.	2018-03-23 12:32:27 +02:00
Arttu Ylä-Outinen	fb462b25ef	Fix transform skip for inter The transform skip flag in cu_info_t was stored under the intra substruct even though transform skip can be used for inter as well. This caused bitstream errors. Fixed by moving the flag out of the substruct.	2018-03-20 11:01:33 +02:00
Arttu Ylä-Outinen	b64e46707d	Skip raster scan step in TZ search Raster scan is very slow and the BD-rate improvement is marginal.	2018-03-01 14:04:03 +02:00
Arttu Ylä-Outinen	6877064230	Add zero neighborhood check to TZ search Adds an additional grid search step that starts from the zero motion vector after the normal grid search. The search range for this step is half of the normal range.	2018-03-01 14:02:13 +02:00
Arttu Ylä-Outinen	74a413c46a	Switch to star refinement in TZ search	2018-03-01 13:06:14 +02:00
Arttu Ylä-Outinen	ebee428ee1	Add loop termination to TZ grid search Terminates the grid search if no better motion vector was found in the last three iterations.	2018-03-01 13:06:06 +02:00
Arttu Ylä-Outinen	4c175621dd	Fix TZ grid search and star refinement - Changes TZ grid search and star refinement to keep the origin constant instead of moving to the best position after each iteration. - Changes star refinement to loop until there is no more improvement, instead of running the step only once.	2018-03-01 12:56:57 +02:00
Arttu Ylä-Outinen	9c2d0074a2	Add rounding of motion vectors in inter search When the starting point for integer motion estimation was selected among the merge candidates, the candidate motion vectors were always rounded down. This commit changes the rounding so that they are rounded to the nearest integer MV instead.	2018-03-01 09:39:21 +02:00
Ari Lemmetti	662430d441	Select CU type based on SSD, transform unit tree and mode cost of luma and chroma on --rd=2	2018-02-22 19:26:48 +02:00
Arttu Ylä-Outinen	cb06cfeadb	Drop temporary arrays in bipred search Changes bipred search to use the original source and reconstruction arrays directly instead of copying them.	2018-02-14 11:20:51 +02:00
Arttu Ylä-Outinen	0ea516ba30	Move bipred search to a separate function	2018-02-14 09:56:53 +02:00
Arttu Ylä-Outinen	6f506be12d	Drop dynamic allocation from bipred search Moves the temporary LCU struct used in bipred search from the heap to the stack. The single malloc call was a huge bottleneck in bipred.	2018-02-14 09:55:02 +02:00
Arttu Ylä-Outinen	7155dd0db7	Add negative references to L1 list Changes reference index list creation so that the negative references are added to L1 in addition to L0 when biprediction is enabled and no reordering of pictures is done. Biprediction can now be used with the low-delay GOP structure.	2018-02-07 14:54:52 +02:00
Arttu Ylä-Outinen	4b24cd03a2	Update for crypto++ 6.0.0 compatibility Changes the crypto module to use unsigned char instead of byte. The byte typedef is no longer included in the global namespace in crypto++ 6.0.0. See https://github.com/weidai11/cryptopp/issues/442. Fixes #184.	2018-02-05 13:35:03 +02:00
Arttu Ylä-Outinen	8c53417006	Check zero coefficient cost for inter Checks the cost of flushing all coefficients of an inter block to zero. This is much faster than doing full RDOQ but can still reduce bitrate significantly. Encoding speed is increased since fewer coefficient bits have to be coded with CABAC.	2018-01-29 12:41:56 +02:00
Arttu Ylä-Outinen	018b5ffa64	Move inter CU reconstruction to a new function Moves code for reconstructing all PUs in an inter CU to a new function kvz_inter_recon_cu in inter.c.	2018-01-24 15:05:39 +02:00
Arttu Ylä-Outinen	405b8c1069	Refactor inter MVD cost functions Moves duplicate code for writing the MVD of a single motion vector from kvz_get_mvd_coding_cost_cabac and encoder_inter_prediction_unit to a new function.	2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen	c1cca1ad7f	Refactor inter MV candidate selection Moves duplicate code for checking the best MV candidate from functions calc_mvd_cost, search_pu_inter_ref and search_pu_inter to a new function.	2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen	9067aa4535	Remove an unnecessary copy in SMP/AMP search SMP/AMP search is performed using a lower work tree level than the normal inter search so the prediction info must be copied up if an SMP/AMP mode is chosen. Previously pixels and coefficient were copied as well. Changed to only copy prediction info.	2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen	89a930d6dd	Add part mode bitcost when using SMP/AMP blocks	2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen	fc43643ba5	Use a transform split for SMP and AMP blocks	2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen	c74ede148b	Fix CBF flags for 4x4 luma blocks CBF flags were not being propagated to the upper level from blocks of size 4x4.	2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen	0a69e6d18f	Fix selection of transform function for 4x4 blocks DST function was returned for inter luma transform blocks of size 4x4 even though they must use DCT. Fixed by checking the prediction mode of the block in addition to whether it is chroma or luma.	2018-01-18 10:36:25 +02:00
Miika Metsoila	bcedfd6669	Remove the usage of errno in me-steps argument parsing	2018-01-16 14:38:43 +02:00
Miika Metsoila	39ed36830e	Merge branch 'me_steps'	2018-01-16 14:22:59 +02:00
Miika Metsoila	61213e3ad9	Improve step parameter parsing and usage	2018-01-10 15:16:52 +02:00
Arttu Ylä-Outinen	649113a821	Fix inter search being used for 4x4 blocks When 4x4 intra blocks are enabled and inter search is limited to 16x16 and larger blocks, it is possible that inter search is accidentally done for 4x4 blocks. Fixed by checking that block size is at least 8x8 before doing inter search.	2018-01-10 14:21:48 +02:00
Miika Metsoila	e8e0e7596a	Add a step-cutoff parameter for motion estimation search	2017-12-22 14:04:25 +02:00
Miika Metsoila	4e13608b01	Merge branch 'diamond_search'	2017-12-18 14:11:53 +02:00
Miika Metsoila	2cde0d1a18	Document diamond search option	2017-12-12 14:45:01 +02:00
Miika Metsoila	b923b63b42	Add diamond search	2017-12-12 14:40:14 +02:00
Ari Lemmetti	14892fda00	Replace simple coefficient cost estimation with CABAC. Substantial improvement. Approximation proved to be too inaccurate while not giving actually that much speedup.	2017-12-10 01:23:48 +02:00
Miika Metsoila	ea79069dc8	Fix a type warning in encmain.c	2017-12-08 16:22:40 +02:00
Miika Metsoila	6aa4cd7528	Fix type warnings	2017-12-08 16:16:36 +02:00
Miika Metsoila	b3486b5114	Fix gcc/clang warnings and errors in cfg.c	2017-12-08 16:09:00 +02:00
Miika Metsoila	bac07457ea	Merge branch 'hevc_level'	2017-12-08 15:57:38 +02:00
Miika Metsoila	c67a24e6ec	Update readme and --help text	2017-12-07 12:32:46 +02:00
Ari Lemmetti	713e694d82	Define HAVE_STRUCT_TIMESPEC on Visual Studio 2015 and later Fixes redefinition of timespec that Pthreads-Win32 does even if it has been already defined.	2017-12-05 18:26:12 +02:00
Miika Metsoila	f64d42169f	Improve bitrate checking to accommodate non-integer and less than 1 framerates	2017-12-01 17:20:12 +02:00
Miika Metsoila	57cf92d35f	Implement level's bitrate limit checking during encoding	2017-11-28 16:19:44 +02:00
Miika Metsoila	021fb27787	Add high-tier flag	2017-11-20 16:05:28 +02:00
Miika Metsoila	d249059d61	Minor refactoring of level checking	2017-11-20 13:25:26 +02:00
Arttu Ylä-Outinen	cf85d52b9d	Kvazaar version 1.2.0	2017-11-17 15:23:33 +02:00
Miika Metsoila	4c1512e8c5	Add a check for maximum picture width and height for the given level	2017-11-15 16:39:59 +02:00
Arttu Ylä-Outinen	4cb054295a	Fix linkers Overrides the linkers used for kvazaar, libkvazaar.la and kvazaar_tests. When crypto++ is enabled, the C++ linker is used and when it is disabled, the C linker is used. This removes the need to explicitly specify -lstdc++ in configure when crypto++ is used and fixes the build with crypto++ when libstd++ is not installed.	2017-11-13 15:09:38 +02:00
Miika Metsoila	f9a4aba867	Update documentation, fix input fps default value, remove 0 as default level	2017-11-09 16:53:31 +02:00
Miika Metsoila	ebba0a4f01	Test if input conforms to it's level's limits (excluding bitrate)	2017-11-08 16:15:41 +02:00
Miika Metsoila	fb4d0c3cf2	Move level argument parsing to the correct place and give it initial values	2017-11-03 15:47:35 +02:00
Miika Metsoila	61a31054e1	Add level command-line parameter	2017-11-03 13:04:05 +02:00
Arttu Ylä-Outinen	9974380cdd	Fix bipred and temporal MVP - Fixes two errors in calculating the POC for the reference frame for temporal candidate MV scaling. - Fixes using the MV for the wrong direction when the temporal MV predictor block uses bi-prediction. Fixes #160.	2017-10-25 12:26:41 +03:00
Arttu Ylä-Outinen	841597e123	Fix picture and slice types Changes handling of intra pictures for --gop=8 so that every picture with POC divisible by the intra period is intra. The first picture is IDR and the rest of the intra pictures are CRA. POC is not reset at CRA pictures. The leading pictures that follow the CRA picture are changed to RASL so they are allowed to refer to pictures before the CRA picture. Changes inter slice types to P when the L1 reference list is empty and to B otherwise. In all-intra, all pictures are now IDR pictures with POC zero.	2017-10-20 13:35:26 +03:00
Jaakko Laitinen	957b6850c3	Change ref list printout to match hm decoded printout	2017-09-25 13:48:56 +03:00
Arttu Ylä-Outinen	20aea8df63	Fix POCs when using --gop=8 When using --gop=8 with an intra period greater than one, a single POC would be skipped before every intra frame. This commit fixes the problem by turning the intra frames into BLA frames with leading pictures when using --gop=8.	2017-09-19 09:31:58 +03:00
Miika Metsoila	6e00f63469	Remove unused variables from search_pu_inter_ref function	2017-09-18 15:36:37 +03:00
Miika Metsoila	7b0101ce3d	Merge branch 'reflist_changes' # Conflicts: # src/encoderstate.c # src/search_inter.c	2017-09-18 14:59:37 +03:00
Miika Metsoila	769b17768d	Change max function to MAX macro for clang/gcc compatibility. Remove couple of unnecessary comments	2017-09-15 14:21:51 +03:00
Miika Metsoila	5f7c5443a3	Remove inter.poc	2017-09-12 14:23:19 +03:00
Miika Metsoila	6bd78a3da7	Reverse L0 list sort direction	2017-09-12 14:23:18 +03:00
Miika Metsoila	83dc7e7f50	Made L0 to sort and fixed mv_ref_coded in search_pu_inter	2017-09-12 14:23:18 +03:00
Timothe FRIGNAC	d3362a238e	changed strtod to strtol	2017-08-31 15:14:31 +02:00
Timothe FRIGNAC	3a1ab54ff0	Fixed memory leaks	2017-08-31 11:51:41 +02:00
Timothe FRIGNAC	466297fd77	Fixed build error	2017-08-29 17:01:18 +02:00
Timothe FRIGNAC	2e130912cb	Add --key opt	2017-08-28 17:15:13 +02:00
Miika Metsoila	a5f4cf09b5	Switched from storing POCs in inter.poc to state->frame->refLXs array	2017-08-21 16:34:57 +03:00
Arttu Ylä-Outinen	409d2114f0	Fix motion vector constraints Fixes integer motion vectors being constrained more than what was necessary when using --mv-constraint or --wpp.	2017-08-11 14:41:36 +03:00
Arttu Ylä-Outinen	7144a00beb	Rewrite thread queue Changes thread queue so that only the jobs that are ready to run are stored in the queue. Other jobs are kept track of by pointers in the reverse dependency lists of other jobs. When a job is ready to run it is appended to the queue. The job queue is stored as a linked list. The definitions of threadqueue_queue_t and threadqueue_job_t are moved to the .c file, turning them into opaque structs. Makes thread queue code simpler. Fixes some TSan errors.	2017-08-11 14:18:12 +03:00
Arttu Ylä-Outinen	bc47fe94af	Drop thread queue debug code	2017-08-11 14:18:12 +03:00
Eemeli Kallio	e5cbc7a205	--sao now enables full sao	2017-08-11 13:26:55 +03:00
Eemeli Kallio	4c3453d26f	Fixed issue with no-sao argument	2017-08-11 13:12:22 +03:00
Eemeli Kallio	8674c0f5ee	Added paremeter for band and edge sao.	2017-08-11 11:57:09 +03:00
Eemeli Kallio	d9b93ea368	Added possibility to skip edge or band sao.	2017-08-11 11:51:49 +03:00
Arttu Ylä-Outinen	4b73bdd9aa	Skip checked motion vectors in early termination Changes the second iteration of early termination to skip the motion vectors that were already checked in the first iteration.	2017-08-09 14:29:09 +03:00
Arttu Ylä-Outinen	606d441362	Skip computing MV cost twice in hexagon search Changes the first step of hexagon search to skip the zero offset since the cost of the motion vector has already been computed.	2017-08-09 14:29:09 +03:00
Arttu Ylä-Outinen	fa4648061d	Add mv, cost and bitcost to inter_search_info_t	2017-08-09 14:29:08 +03:00
Arttu Ylä-Outinen	328f051d7f	Put inter search parameters in a single struct Adds struct inter_search_info_t for holding the parameters that are used by most function related to inter search. Passing the parameters in a single struct greatly reduces the number of parameters for many functions.	2017-08-09 14:27:53 +03:00
Miika Metsoila	0dd069f8af	Fixed using wrong POC in add_temporal_candidate	2017-08-09 13:50:21 +03:00
Miika Metsoila	25e0a954c7	Fixed 2 bugs causing incorrect video output	2017-08-09 13:50:21 +03:00
Arttu Ylä-Outinen	24ecddd2a5	Fix wrong strides in SAO reconstruction Functions kvz_sao_reconstruct and encoder_sao_reconstruct used frame->width as the stride instead of frame->rec->stride when accessing frame->rec->data. This caused errors when using tiles and SAO.	2017-08-01 15:40:49 +03:00
Arttu Ylä-Outinen	f0bf959d17	Fix alignment errors in 32-bit build with MSVC Changes the work_tree parameter in search.c functions from an array to a pointer. Fixes "formal parameter with requested alignment of 8 won't be aligned" errors.	2017-07-28 09:27:02 +03:00
Arttu Ylä-Outinen	9694bd2fae	Fix build on 32-bit systems Function coeff_abs_sum_avx2 that was added in `e950c9b` was outside the AVX2 #if directive.	2017-07-28 09:19:29 +03:00
Arttu Ylä-Outinen	ecb0275cdd	Store CU arrays as pointers to the main array Changes field state->tile->frame->cu_array->data to point to the CU array in the main encoder state. Removes the need to copy the CU array to the main CU array after search.	2017-07-28 08:36:45 +03:00
Arttu Ylä-Outinen	e950c9b101	Add AVX2 implementation for coefficient sum	2017-07-28 07:39:36 +03:00
Arttu Ylä-Outinen	d50ae6990c	Add sum of absolute coefficients to strategies	2017-07-28 07:39:15 +03:00
Arttu Ylä-Outinen	59faca0646	Skip CABAC coefficient cost for --rd=0	2017-07-28 07:33:03 +03:00
Arttu Ylä-Outinen	19e051ea40	Reduce intra threshold Reduces intra threshold for --rd=0 from 20 to 8. Threshold of 20 increased BD-Rate too much.	2017-07-25 13:26:38 +03:00
Arttu Ylä-Outinen	e9cf15465e	Fix inter cost in bipred The cost of coding MV ref indices and MV direction was added to bitcost but not inter cost. Fixed by adding the extra bits to inter as well.	2017-07-24 15:24:04 +03:00
Arttu Ylä-Outinen	edbe00763e	Drop extra parameter in kvz_image_calc_sad Drops the parameter max_lcu_below which was always set to -1.	2017-07-24 15:21:19 +03:00
Arttu Ylä-Outinen	ffac29061f	Fix extrapolated inter SATD	2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen	631ef53d2a	Fix inter cost calculations Inter costs are computed using SAD except when fractional motion estimation or bi-prediction is enabled. This commit changes search_pu_inter_ref to recalculate the cost with SATD. Fixes inter/intra cost comparisons since intra costs are always SATD costs.	2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen	6ce2fb1238	Add pixel offsets to encoder_state_config_tile_t Adds fields offset_x and offset_y to encoder_state_config_tile_t.	2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen	2380ba0d41	Reduce copying in kvz_get_coeff_cost Changes function kvz_get_coeff_cost to only copy the CABAC contexts and not the whole encoder state. Other threads could be simultaneously using the other parts of the encoder state. Only copying the CABAC fixes a TSan data race warning.	2017-07-24 12:38:41 +03:00
Arttu Ylä-Outinen	24b462f801	Align coefficients to 8 bytes Adds alignment attribute to lcu_coeff_t. The coefficients are sometimes handled as 64-bit integers containing four coefficients so the arrays should be aligned to 8 bytes. Fixes a UBSan error about misaligned reads.	2017-07-24 12:37:37 +03:00
Arttu Ylä-Outinen	5ddb43c6fe	Fix undefined left shifts in rdo Replaces left shifts by multiplications when the operand may be a negative value. Left shift of a negative value is undefined behavior.	2017-07-24 12:35:10 +03:00
Arttu Ylä-Outinen	d1e64ad62b	Fix undefined left shifts Replaces left shifts by multiplications when the operand may be a negative value. Left shift of a negative value is undefined behavior.	2017-07-20 11:15:30 +03:00
Arttu Ylä-Outinen	07b5fb9caf	Fix out-of-bounds read in encoderstate When calling encoder_state_encode_leaf with POC 0, index -1 of the GOP array would be accessed. Fixed by skipping the code for I-frames.	2017-07-20 11:15:30 +03:00
Arttu Ylä-Outinen	8c4a3473a8	Change --owf=auto and --threads=auto selection Changes OWF selection so that it is chosen based on the maximum number of parallel CTUs. Number of threads is limited to prevent overhead from extra threads.	2017-07-20 09:42:28 +03:00
Arttu Ylä-Outinen	4fc9b743c1	Drop an unnecessary pthread_cond_broadcast Drop pthread_cond_broadcast on threadqueue->cond in function kvz_threadqueue_waitfor. The broadcast caused threads to be woken up more often than necessary.	2017-07-19 11:09:30 +03:00
Arttu Ylä-Outinen	14003c6a30	Disable printing PSNR with --no-psnr	2017-07-19 10:38:37 +03:00
Arttu Ylä-Outinen	e90bde5c62	Clarify PSNR output Adds letters Y, U and V to the PSNR output to make it clearer that the printed values are the luma and chroma PSNR.	2017-07-19 10:33:43 +03:00
Arttu Ylä-Outinen	fdb3480b54	Enable strategies for SAO reconstruction Re-enables strategies for SAO reconstruction. They were disabled in commit `ec9ff42`.	2017-07-11 10:35:18 +03:00
Arttu Ylä-Outinen	333dba3884	Add static to SAO strategies	2017-07-11 10:02:01 +03:00
Miika Metsoila	e8cc2d8f6a	Small fixes	2017-07-07 13:58:19 +03:00
Arttu Ylä-Outinen	67a60a35e3	Fix invalid calls to normalize_lcu_weights Changes encoder_state_init_new_frame to only call normalize_lcu_weights when the weights have been written to the array and rate control is enabled. When rate control is disabled, the weights are not used.	2017-07-07 11:05:31 +03:00
Arttu Ylä-Outinen	563bc26e71	Fix out-of-bounds read in AVX2 SAO AVX2 version of SAO loaded offsets with a 256 bit read even though there are only five 32 bit integers.	2017-07-06 13:04:52 +03:00
Arttu Ylä-Outinen	0850b17f96	Drop get_wpp_limit in search_inter WPP limit for motion vectors is now computed inside fracmv_within_tile.	2017-07-05 13:22:53 +03:00
Arttu Ylä-Outinen	2a85f0f5a4	Move hard-coded MV limits to encoder_control_t Adds field max_inter_ref_lcu to encoder_control_t. It is used to set up inter-LCU dependencies in encoder_state_encode_leaf and restrict motion vectors in fracmv_within_tile.	2017-07-05 13:22:53 +03:00
Arttu Ylä-Outinen	bb5354f7e2	Relax inter-CTU dependencies when SAO is off When using WPP and OWF, the first CTU of a row depends on the last CTU of the row below in the reference frame. This is necessary when SAO is enabled since we currently do SAO for a whole CTU row at a time. When SAO is disabled, however, it is unnecessary to wait for the whole row. Changes CTUs to depend only on the CTU below in the reference frame instead of the whole row when WPP and OWF are enabled and SAO disabled. Gives a significant speedup when running on a machine with many CPU cores.	2017-07-05 13:21:06 +03:00
Arttu Ylä-Outinen	1efa2708b2	Do SAO reconstruction for a single CTU at a time Moves SAO reconstruction into encoder_state_worker_encode_lcu instead of doing it in a separate step for the whole CTU row. Reconstruction of the rightmost 10 pixels and bottommost 10 pixels of a CTU is delayed until the neighboring CTU has been deblocked. Doing SAO for the whole CTU row at a time caused unnecessary inter-CTU dependencies when using WPP and OWF. The first CTU of a row would need to wait until SAO was done for the row below in the previous frame. Moving SAO reconstruction to immediately after deblocking each CTU fixes this problem.	2017-07-04 15:14:31 +03:00
Arttu Ylä-Outinen	ec9ff42077	Rewrite SAO recon to handle arbitrary sized blocks Adds width and height parameters to function kvz_sao_reconstruct and changes it to take coordinates in units of pixels. This will be useful for doing SAO for areas smaller than a whole CTU.	2017-06-30 16:09:18 +03:00
Miika Metsoila	dcd7acf4fd	Fixed crash and incorrect info output	2017-06-27 16:05:15 +03:00
Miika Metsoila	f8b6234fdb	Changes to refence lists to behave more like L0/L1 lists from the specification	2017-06-27 16:05:15 +03:00
Arttu Ylä-Outinen	2c66e0bbd2	Fix warnings about invalid reads in AVX2 ipol AVX2 filter functions read pixels in chunks of 8 or 16 bytes. At the end of the block, the read goes out of the bounds of the pixels array. The extra pixels do not affect the result. Fixes valgrind complaining about the invalid reads by allocating 5 extra pixels in kvz_get_extended_block_avx2	2017-06-22 09:37:55 +03:00
Arttu Ylä-Outinen	4d20e156db	Fix handling intra period not multiple of GOP length With low delay GOP structure, it is possible to use an intra period that is not a multiple of the GOP structure length. Commit `00c9f52` changed encoder_state_init_new_frame to reset POC on intra frames. GOP offset, however, was not reset, resulting in invalid POCs and references for the following frames. This commit changes function kvz_encoder_feed_frame so that GOP offset is correctly reset on intra frames.	2017-06-22 09:29:00 +03:00
Arttu Ylä-Outinen	00c9f52bd4	Fix setting picture type when using GOP Changes encoder_state_init_new_frame to set intra frame pictype to KVZ_NAL_IDR_W_RADL even when using GOP.	2017-06-21 13:21:47 +03:00
Arttu Ylä-Outinen	f54a25f112	Fix crash when immediately closing encoder When closing the encoder, the pictures stored in the input frame buffer are freed by repeatedly calling kvz_encoder_feed_frame. If the encoder was closed immediately after opening it, kvz_encoder_feed_frame would be called with an unprepared encoder state. This would trigger an assert. Fixed by changing kvz_encoder_feed_frame so that it does not require the encoder state to be prepared.	2017-06-15 11:57:46 +03:00
Arttu Ylä-Outinen	b74e0458fd	Set inter transform depth to zero Sets max_transform_hierarchy_depth_inter to 0 in SPS. This saves some bits because split_transform_flag does not need to be coded for inter blocks. When SMP and AMP blocks are enabled the depth is set to 1 instead. Otherwise inter split flag would default to 1 for SMP and AMP blocks, resulting in an unnecessary transform split.	2017-06-08 10:08:20 +03:00
Arttu Ylä-Outinen	8dd01ba5a9	Refactor helper functions in search Combines functions lcu_set_intra_mode and lcu_set_inter_pu to a single function. Removes some duplicated code.	2017-06-06 10:32:09 +03:00
Arttu Ylä-Outinen	1bbecf7584	Refactor work tree copy functions Extracts common code shared by work_tree_copy_up and work_tree_copy_down to a separate function.	2017-06-06 10:32:00 +03:00
Arttu Ylä-Outinen	2b169d5d63	Fix crash in kvazaar_close Changes kvazaar_close to stop all threads before freeing encoder states. Fixes a crash when the encoder is closed before all pictures have been encoded.	2017-06-02 10:05:33 +03:00
Arttu Ylä-Outinen	eb9a05b7ef	Fix memory leak Changes kvazaar_close to free the remaining pictures in the the input frame buffer. Fixes a memory leak when the encoder is closed while there are pictures left in the buffer.	2017-06-01 15:39:35 +03:00
Arttu Ylä-Outinen	8b2483ca1c	Combine intra reconstruction functions Replaces function kvz_intra_recon_lcu_luma and kvz_intra_recon_lcu_chroma in intra.c with function kvz_intra_recon_cu. The new function can handle reconstruction for both luma and chroma. Removes some duplicated code.	2017-05-24 12:07:31 +03:00
Arttu Ylä-Outinen	e67fdb853d	Move intra leaf TB recon to a separate function Moves code for intra leaf transform block reconstruction from functions kvz_intra_recon_lcu_luma and kvz_intra_recon_lcu_chroma to a new function intra_recon_tb_leaf. Removes some duplicated code.	2017-05-24 12:07:31 +03:00
Arttu Ylä-Outinen	13d2fdbd21	Drop unused kvz_videoframe_get_cu functions	2017-05-24 11:15:31 +03:00
Arttu Ylä-Outinen	f5eef7f33c	Use luma pixel coordinates in encode_coding_tree Changes functions encode_intra_coding_unit and encode_coding_tree to take coordinate arguments in units of luma pixels instead of 8 px blocks. This should make the code easier to understand.	2017-05-24 11:15:31 +03:00
Arttu Ylä-Outinen	525a5180ff	Combine intra CU encoding functions Merges functions encode_intra_coding_unit and encode_intra_coding_unit_encry. Removes a lot of duplicated code.	2017-05-24 11:12:40 +03:00
Arttu Ylä-Outinen	610c91b0c5	Use luma pixel coordinates in TU coding functions Changes functions encode_transform_unit and encode_transform_coeff to take coordinate arguments in units of luma pixels instead of 4 px blocks. This should make the code easier to understand.	2017-05-23 15:36:16 +03:00
Arttu Ylä-Outinen	2e8838de6e	Fix crash when crypto compiled in but disabled When kvazaar was built with crypto++ but running without using encryption features, kvazaar attempted to delete an uninitialized crypto handle. Fixed by setting the handle to NULL in kvz_encoder_state_init.	2017-05-23 14:01:48 +03:00
Arttu Ylä-Outinen	2f2c281e8e	Fix a memory leak in crypto A CryptoPP::CFB_Mode<CryptoPP::AES>::Encryption was allocated at the beginning of encoder_state_encode_leaf and was never freed. This commit changes encoder_state_worker_encode_lcu to delete the CFB_Mode. Also moves crypto handle from encoder_state_config_tile_t to encoder_state_t so that it can be safely deleted without affecting other threads in the same tile.	2017-05-23 11:51:25 +03:00
Arttu Ylä-Outinen	22155950c1	Rewrite crypto to conform to kvazaar code style	2017-05-23 11:51:25 +03:00
Arttu Ylä-Outinen	6829865190	Fix inline declaration in intra_mode_encryption Moves the inline declaration of intra_mode_encryption before the type and changes it to use the INLINE macro. Inline declaration after type triggered a warning on GCC.	2017-05-23 11:50:32 +03:00
Arttu Ylä-Outinen	5f8e17d4ba	Eliminate a race condition in threadqueue Fixes the order of acquiring locks for the job and its dependency in kvz_threadqueue_job_dep_add. The dependency is locked before the job that depends on it. This is the same order as in threadqueue_worker. Acquiring the locks in different order in kvz_threadqueue_job_dep_add and threadqueue_worker would sometimes result in a deadlock.	2017-05-18 12:25:53 +03:00
Arttu Ylä-Outinen	4b213477f0	Return best MV from inter early terminate When using --me-early-termination=sensitive, early termination of inter search used to always return the starting point if no tested motion vector was good enough to continue the search. This commit changes early_termination to always return the best motion vector and cost found.	2017-05-18 09:05:14 +03:00
Arttu Ylä-Outinen	382636de55	Fix handling too large QPs Changes kvz_config_validate to output an error if the given QP is out of range and changes kvz_set_picture_lambda_and_qp to clip the QP to the valid range if is too large after applying QP offset from GOP structure.	2017-05-17 12:41:51 +03:00
Arttu Ylä-Outinen	de8b59c681	Drop unused function kvz_coefficients_blit	2017-05-12 16:48:30 +03:00
Arttu Ylä-Outinen	bcfa5a3cd9	Add a comment explaining the coefficient order	2017-05-12 16:46:57 +03:00
Arttu Ylä-Outinen	95775a1645	Change coefficient storage order Changes coefficient storage order to a zig-zag order. Reduces unnecessary copying of coefficients to temporary arrays.	2017-05-12 16:46:57 +03:00
Arttu Ylä-Outinen	9395867a9a	Quantize all colors in a single traversal Changes kvz_quantize_lcu_residual to process all three colors in a single traversal of the TU tree.	2017-05-12 16:42:41 +03:00
Arttu Ylä-Outinen	1e58fd6b16	Split kvz_quantize_lcu_residual Splits kvz_quantize_lcu_residual to two functions that handle the TU tree recursion and quantization of a single TU.	2017-05-12 16:42:41 +03:00
Arttu Ylä-Outinen	cc87e0dcc7	Combine luma and chroma quantization functions Replaces functions kvz_quantize_lcu_luma_residual and kvz_quantize_lcu_chroma_residual in transform.c with function kvz_quantize_lcu_residual. The new function can handle any of the YUV colors. Removes some duplicated code.	2017-05-12 16:42:41 +03:00
Arttu Ylä-Outinen	1357dd0599	Pass coeffs through encoder state Changes the way coefficients are passed from kvz_search_lcu to kvz_encode_coding_tree. Drops fields coeff_y, coeff_u and coeff_v in videoframe_t and instead passes them through field coeff in endoder_state_t.	2017-05-12 16:42:41 +03:00
Eemeli Kallio	2cad3173ec	Reduced amount of modes for search_intra_rdo	2017-05-12 15:56:07 +03:00
Arttu Ylä-Outinen	26adef4492	Merge branch 'erp-aqp'	2017-05-12 15:05:24 +03:00
Eemeli Kallio	55e0e65733	Added INLINE to kvz_get_ic_rate and kvz_get_coded_level in rdo.c	2017-05-12 15:03:30 +03:00
Arttu Ylä-Outinen	ee3d4d0e78	Add adaptive QP for 360 degree video Adds option --erp-aqp for enabling adaptive QP for 360 degree video with equirectangular projection. When projected into a spherical surface, the middle part of the video covers relatively larger area than the top and bottom parts. Enabling --erp-aqp sets up a ROI delta QP array which uses higher QPs for the top and bottom of the video and lower QPs for the middle part.	2017-05-11 12:31:53 +03:00
Arttu Ylä-Outinen	79cb3a2fd3	Permit negative QP deltas in ROI Delta QPs should not be arbitrarily restricted to positive values.	2017-05-11 12:13:47 +03:00
Arttu Ylä-Outinen	edfbd6f122	Add field lcu_dqp_enabled to encoder_control_t Delta QPs for LCUs are enabled when either ROI coding or rate control is enabled. Having a single field is simpler than always checking whether ROI or rate control is enabled.	2017-05-11 12:13:47 +03:00
Arttu Ylä-Outinen	2f2405dfe6	Fix crash when PU depth is limited When video width or height was not a multiple of the smallest CU size, no prediction would be performed at the border CUs. Kvazaar would later crash at an assertion failure when attempting to write the bitstream for the CU. Fixed by permitting inter and intra prediction when the CU split is forced, even if CUs of that size would otherwise be disabled.	2017-04-27 10:35:48 +03:00
Arttu Ylä-Outinen	9130b5107c	Change handling of infinite PSNR in encmain Changes encmain to print 999.99 as PSNR when SSE is zero. This behavior is in line with HM. Previously SSE was set to 99 when it was zero.	2017-04-27 10:35:13 +03:00
Arttu Ylä-Outinen	a9c878b535	Fix crash with WPP when threads are disabled When WPP is enabled, a reference to SAO reconstruction job is copied from the wavefront to the main encoder state. However, when threads are disabled, the job is a null pointer and dereferencing it crashes the encoder. Fixed by adding a null pointer check.	2017-04-24 12:59:57 +03:00
Arttu Ylä-Outinen	2991962033	Add reference counting to threadequeue_job_t Both the thread queue and the encoder states hold pointers to the thread queue jobs. It is possible that a job is removed from the thread queue and freed while the encoder state is still using it. This commit adds reference counting to threadqueue_job_t in order to fix the problem. Fixes #161.	2017-04-12 16:13:52 +03:00
Arttu Ylä-Outinen	bd8adff43a	Drop unused defines in threads.h	2017-04-12 03:41:07 -07:00
Arttu Ylä-Outinen	7ab0a7aff2	Fix semaphores on Mac POSIX semaphores are deprecated on Mac. This commit replaces POSIX semaphores by Grand Central Dispatch semaphores when building on Mac.	2017-04-12 03:41:02 -07:00
Arttu Ylä-Outinen	26693e1402	Fix reliance on undefined behaviour in encmain Pthread mutexes were used for synchronization in encmain by locking and unlocking them from different threads. However, according to the POSIX standard, unlocking a mutex from a different thread is undefined behaviour. This commit replaces the mutexes by semaphores which can be used from different threads.	2017-04-12 03:23:58 -07:00
Ari Lemmetti	47a9f0de04	Modify and use FILL_ARRAY macro to prevent warning on GCC 7 Following warning was given and is false positive error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]	2017-04-11 14:04:25 +03:00
Eemeli Kallio	f7e01b8ba1	Fixed error on rd=3	2017-04-05 13:27:14 +03:00
Eemeli Kallio	9f605152ae	Changed intra to use best rough cost when using inter and rd=2	2017-04-05 13:01:32 +03:00
Ari Lemmetti	33ce101ab5	Revert "Use sizeof(uint32_t) to avoid warning in GCC7." Did not fix the problem. This reverts commit `e3c3e74926`.	2017-04-03 20:21:33 +03:00
Ari Lemmetti	e3c3e74926	Use sizeof(uint32_t) to avoid warning in GCC7. error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]	2017-04-03 19:16:09 +03:00
Arttu Ylä-Outinen	df359b8f95	Fix indentation in encode_coding_tree.c Fixes indentation of a for loop that was causing a misleading indentation warning on GCC. Fixes #163.	2017-03-08 22:56:28 +09:00
Pierre-Loup Cabarat	2b8ce5e47c	Add intra prediction modes encryption	2017-03-06 17:27:39 +01:00
Arttu Ylä-Outinen	aae141f2d3	Fix order of frames with --debug When the decoding and presentation orders of pictures are different (with GOP), the frames in YUV debug output would be in the decoding order. This commit changes the kvazaar command line program to store the reconstructed pictures in a buffer so that they can be output in the presentation order. Fixes #101.	2017-02-28 14:09:24 +09:00
Arttu Ylä-Outinen	094b39e7fc	Refactor inter MV/merge candidate selection Adds struct merge_candidates_t for holding the spatial and temporal merge candidates. Changes functions with separate parameters for each candidate to use the struct instead.	2017-02-22 15:56:36 +09:00
Arttu Ylä-Outinen	3409748a8f	Refactor inter MVP candidate selection Adds helper function add_mvp_candidate.	2017-02-22 15:56:27 +09:00
Arttu Ylä-Outinen	ef6503c728	Refactor inter merge candidate selection Adds helper function add_merge_candidate and replaces macro CHECK_DUPLICATE with function is_duplicate_candidate.	2017-02-22 02:50:52 +09:00
Arttu Ylä-Outinen	f12e09bc40	Refactor inter TMVP selection Adds helper function add_temporal_candidate to inter.c.	2017-02-22 02:08:10 +09:00
Arttu Ylä-Outinen	4f88066740	Refactor MV and merge candidate selection Replaces macros APPLY_MV_SCALING and CALCULATE_SCALE with helper functions.	2017-02-22 01:14:16 +09:00
Arttu Ylä-Outinen	db08041d9a	Refactor inter TMVP selection Merges three if-clauses to remove two levels of indentation.	2017-02-21 23:56:01 +09:00
Marko Viitanen	85e2a40da3	Clip scaled motion vectors, scale and td/tb values to appropriate limits Fixes #158.	2017-02-20 15:40:20 +02:00
Ari Koivula	7369f25f64	Bump version to 1.1.0	2017-02-16 20:52:05 +02:00
Ari Lemmetti	b021d2244e	Reduce more unnecessary initializations.	2017-02-16 17:25:26 +02:00
Ari Lemmetti	acd12cba1e	Remove unnecessary memory initialization to zero Values in interval [last_scanpos, 0] are overwritten in following for loop, except for the sig_coeff_inc value.	2017-02-16 16:48:48 +02:00
Ari Koivula	7ff33e1bf2	Fix default reference picture count The default was 3, instead of the intended 1 of the medium preset.	2017-02-13 17:34:28 +02:00
Marko Viitanen	4251607c04	Fix a bug in TMVP reference POC list	2017-02-13 15:19:24 +02:00
Marko Viitanen	4270d451e6	Fixed some errors after rebase	2017-02-13 15:19:24 +02:00
Marko Viitanen	95effb00d0	Disable TMVP in frames with zero L0 references	2017-02-13 15:19:24 +02:00
Marko Viitanen	b4de1878be	Fixed TMVP scaling and candidate selection for B-frames	2017-02-13 15:19:23 +02:00
Marko Viitanen	23be633ad7	Added TMVP merge candidate scaling for L0	2017-02-13 15:19:23 +02:00
Marko Viitanen	e6aa1b9b9a	Renamed get_mv_cand_from_spatial() to get_mv_cand_from_candidates()	2017-02-13 15:19:23 +02:00
Marko Viitanen	1124bb5fd0	Cleaned up TMVP, mv candidate selection working, merge candidate selection not	2017-02-13 15:19:23 +02:00
Marko Viitanen	d65d2ec88d	WIP: add list of POCs used in the image when pushing to reference	2017-02-13 15:19:22 +02:00
Marko Viitanen	6a25cd3248	WIP: work on tmvp on inter	2017-02-13 15:19:22 +02:00
Marko Viitanen	e538a94eda	Enable TMVP with B-frames	2017-02-13 15:19:22 +02:00
Arttu Ylä-Outinen	363b8b49a2	Fix integer overflows with large resolutions Limits video size so that the number of luma and chroma pixels can be stored in an int. Fixes some integer overflows that resulted in segmentation faults.	2017-02-12 11:40:13 +09:00
Arttu Ylä-Outinen	a5a925fc28	Replace timed waits by normal waits in threadqueue Replaces calls to pthread_cond_timedwait with pthread_cond_wait in threadqueue.c. Simplifies code, as there should be no need for the timeout.	2017-02-11 15:42:03 +09:00
Arttu Ylä-Outinen	fd057498fc	Simplify kvz_config_alloc	2017-02-11 15:42:03 +09:00
Arttu Ylä-Outinen	7f7844caad	Fix finalizing uninitialized encoder states Finalization functions for frame and tile encoder states accessed the frame and tile fields of the encoder state even though they might be NULL. This is the case when the initialization of an encoder state fails. Fixed by adding NULL checks.	2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen	51786eda67	Drop redundant fields in encoder_control_t Some of the fields in encoder_control_t were simply copies of the corresponding fields in kvz_config. This commit drops the copied fields in favor of using the fields in encoder_control_t.cfg directly.	2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen	6a178dee96	Fix leaking memory when --cqmfile given many times Any previously allocated CQM file name was not freed when allocating memory for the new file name.	2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen	63a567ad8a	Fix leaking memory when --roi given many times Any previously allocated delta QP array was not freed when allocating a new array.	2017-02-09 14:05:21 +09:00
Arttu Ylä-Outinen	bfd89136a4	Fix ROI delta QP array not getting freed	2017-02-09 13:23:55 +09:00
Arttu Ylä-Outinen	e78a8dfcf5	Copy the kvz_config passed to encoder_open The kvz_config struct is created by the user but kvazaar keeps a pointer to it. It is easy to break things by modifying the configuration outside kvazaar. In addition, kvazaar modifies the struct even though it is has a const modifier. This commit changes the field cfg in encoder_control_t to be a copy of the kvz_config struct instead of a pointer, removing modifications to the const struct and allowing users to do whatever they want with it after opening the encoder.	2017-02-09 13:23:54 +09:00
Ari Koivula	b8e3513a23	Fix crash with sub-LCU frame sizes and WPP The end of slice was being calculated incorrectly, which led to no tile being created inside the slice, which led to an assert triggering. This fixes the wrong end of slice calculation, but also disallows wavefront rows from being created, if there would be only one. The wavefront initialization code assumes there are always more than one row, so the inter-frame dependency doesn't get added properly. Fixes #153.	2017-02-08 21:41:30 +02:00
Ari Koivula	d893474bab	Fix encoder getting stuck on OS-X Main thread was stuck looping on pthread_cond_timedwait because the abs time given on OS-X had already passed and the wait returned immediately without releasing the mutex to allow worker threads to proceed. Fix was to use the gettimeofday, which returns real time instead of monotonic, which is what pthread_cond_timedwait wants.	2017-02-02 17:27:46 +02:00
Ari Koivula	4ceda1908b	Fix OS-X compiler warning rdo.c:475:25: warning: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long long') but has parameter of type 'int' which may cause truncation of value [-Wabsolute-value] current.cost = -abs(quant_cost_in_bits) + (bits << PRECISION_INC); ^ rdo.c:475:25: note: use function 'llabs' instead current.cost = -abs(quant_cost_in_bits) + (bits << PRECISION_INC);	2017-02-01 18:09:17 +02:00
Ari Koivula	c7d536bbcd	Fix OS-X compiler warning cfg.c:1024:74: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'unsigned long long' [-Wformat] fprintf(stderr, "Too large ROI size: %llu (maximum %zu).\n", size, SIZE_MAX);	2017-02-01 18:09:04 +02:00
Ari Koivula	4467506ef1	Add missing kvz_ prefix	2017-01-31 18:38:02 +02:00
Ari Koivula	ed3bd898fd	Remove Exp-Golomb lookup table This table takes 256kB and isn't used very much. Au revoir!	2017-01-31 18:31:05 +02:00
Ari Koivula	5513744d24	Merge branch 'slices'	2017-01-31 16:14:30 +02:00
Ari Koivula	52904d3e9f	Add --slices=tiles and --slices=wpp This encapsulates tiles or WPP rows into their own slices, making it possible to send them as soon as they are done, instead of waiting for the other substreams to finish and coding the substream offsets in the slice header.	2017-01-31 15:44:23 +02:00
Ari Koivula	0d4d0e869c	Add support for independent slices Not used yet, but they work.	2017-01-31 15:11:50 +02:00
Ari Koivula	46ae382498	Fix bugs with slice header These fixes allow more than one slice to be used to code a picture. - Use correct number of bits to code the slice segment address. - Don't offset_len_minus1 for slices without substreams.	2017-01-31 14:01:59 +02:00
Ari Koivula	f1fc0de2bf	Write slice headers to the parent stream Appending to the child stream doesn't work is the child is a leaf slice state. Simplifies flow by removing distinction between tile and slice. Now that slice headers are written in the parent stream, there is zero difference between tiles and slices from bitstream point of view.	2017-01-31 13:55:05 +02:00
Ari Koivula	04cd875b2c	Move substream finalization to LCU coding job Having some of the termination bits in the LCU coding and some in the substream finalization was needlessly confusing. Doing substream finalization directly after LCU coding makes it easy to verify that the finalization is done correctly. Removes one job per WPP row from the job queue. Removes kvz_cabac_flush, because I don't like bits being put into the bitstream implicitly. Better to have it all in the open.	2017-01-31 13:01:57 +02:00
Ari Koivula	ead490b7b7	Write a new slice NAL for every slice	2017-01-31 12:36:18 +02:00
Ari Koivula	cd496bf50b	Move first_nal_in_au to encoder_state->frame Needed for writing NALs from encoder_state_write_bitstream_children	2017-01-31 12:28:28 +02:00
Arttu Ylä-Outinen	1e6463c08b	Fix inter bipred search When the number of merge candidates was five, biprediction search would read past the bounds of the priority list arrays. Fixed to limit the search to the first four candidates.	2017-01-31 18:23:12 +09:00
Ari Lemmetti	2c069a3e5f	Prevent unnecessary cu search Prevent further analysis as soon as it is known that splitting can not improve cost	2017-01-30 16:21:41 +02:00
Arttu Ylä-Outinen	9b889c3fab	Fix reading ROI files - Checks the return value of fopen when opening the ROI file. Fixes a segfault when the file cannot be opened. - Check that the width and height are positive. Fixes reading past the end of the delta QP array in kvz_set_lcu_lambda_and_qp. - Check for overflow in width * height. Fixes an overflow resulting in a segfault. - Properly check that fscanf succeeds. Fixes silently accepting ROI files that are too short. - Properly close the FILE pointer.	2017-01-29 18:57:27 +09:00
Arttu Ylä-Outinen	46c9a483c3	Fix inter search for small SMP and AMP blocks The function search_pu_inter_ref incorrectly rounded the coordinates of the block to down to a multiple 8 pixels. Small SMP and AMP blocks may start at coordinates that are not multiples of 8. Fixed by removing the rounding. Fixes a failing assert when --mv-constraint is used with --smp or --amp.	2017-01-29 13:34:50 +09:00
Arttu Ylä-Outinen	fb10b56b82	Fix checking if a low delay GOP structure is used Stops assuming that having cfg->gop_lowdelay set means that GOP structure is used since it is possible that cfg->gop_lowdelay is true but cfg->gop_len is zero. Adds checks for cfg->gop_len where needed. Fixes a possible division by zero in kvz_encoder_feed_frame.	2017-01-28 21:56:00 +09:00
Arttu Ylä-Outinen	4f56b04239	Drop an unnecessary conditional Drop a conditional for depth > MAX_DEPTH in search_cu. The depth cannot be greater than MAX_DEPTH (== 3) since an earlier if-clause checks that it is less than MAX_PU_DEPTH (== 4).	2017-01-28 21:35:27 +09:00
Ari Koivula	937a764987	Fix bug in --mv-constraint Subpixel motion estimation return 0-vector when no subpixel vector is within the constraint. Fix is to not call subpixel motion estimation when the integer vector is not within the constraint.	2017-01-26 09:55:57 +02:00
Ari Koivula	4a0121ac42	Add --roi parameter Adds region of interest coding capability. Works by reading a file of delta QP values which will then be applied to each frame at LCU level.	2017-01-26 09:14:14 +02:00
Ari Koivula	6f61836989	Refactor kvz_rdoq_sign_hiding Rename and reorder everything to make more sense. - Moved input tables into their own struct and renamed them to what they actually represent. - Renamed pretty much every variable to comform to our style and to make sense. - Removed the lastCG stuff, as the function already gets passed the last coeff anyway. (it was named width, what the hell?)	2017-01-19 23:58:17 +02:00
Ari Koivula	a85390d0ac	Clean up code using the fixed point frac bit tables This is to prepare for changing the code using the floating point table to use the fixed point table instead. This also allows reducing the size of the fractional part, which was useful for finding every place where the the fixed point presentation is relied upon.	2017-01-19 20:20:51 +02:00
Ari Koivula	24a69c7467	Refactor luma deblocking Changes luma deblocking to use gather and scatter instead of reading to and writing from here and there in memory. Should make them faster and easier to vectorize, or at least cleaner. Splits strong and weak luma deblocking to two functions, as they have almost nothing in common.	2017-01-17 22:13:39 +02:00
Ari Koivula	4cb2fca924	Refactor deblock decision	2017-01-17 19:34:17 +02:00
Arttu Ylä-Outinen	05794c3548	Add missing static to function lambda_to_qp	2017-01-11 15:53:55 +09:00
Arttu Ylä-Outinen	ee518e8ac4	Take header bits into account in rate control	2017-01-11 15:53:55 +09:00
Arttu Ylä-Outinen	c219d3cd94	Fix deblock when CU QP delta is enabled Fixes deblock functions so that they use the correct QP for the filtered edge. Adds field qp to cu_info_t.	2017-01-11 15:53:22 +09:00
Arttu Ylä-Outinen	82a98180e4	Clip LCU lambda to reduce quality fluctuation Limits lambdas for each LCU based on the computed lambda from the previous frame and the frame-level lambda.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	93172fd251	Use separate alpha, beta and lambda for each LCU Changes rate control to use the alpha and beta values stored in lcu_stats_t instead of the frame-level values when selecting lambda and QP for an LCU.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	3af4e9cc8a	Allocate bits separately for each LCU Bits are allocated based on the costs of the LCUs in the previous completely coded frame. Breaks deblock when rate control is used.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	ff5e5ec6d4	Record info about coded LCUs Adds field lcu_stats to encoder_state_config_frame_t. The following data is recorded for each LCU: - number of bits - squared cost - used lambda value - alpha parameter used for rate control - beta parameter used for rate control	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	2a4243acbe	Refactor rate control Moves all code related to setting QP and lambda values to rate_control module.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	71633889ce	Enable CU QP delta when using rate control When rate control is enabled, enable cu_qp_delta_enabled_flag in PPS with diff_cu_qp_delta_depth set to 0. Also adds code for writing the QP deltas and a new cabac context.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	640ff94ecd	Use separate lambda and QP for each LCU Adds fields lambda, lambda_sqrt and qp to encoder_state_t. Drops field cur_lambda_cost_sqrt from encoder_state_config_frame_t and renames cur_lambda_cost to lambda.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	435c387357	Refactor rate control - Defines MIN_LAMBDA and MAX_LAMBDA constants. - Moves resetting state->frame->cur_gop_bits_coded to rate_control.c. - Changes gop_allocate_bits to return the number of bits allocated like pic_allocate_bits does.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	6c4f2d196a	Move fields from encoder_state_t to frame Moves fields prepared and frame_done from encoder_state_t to encoder_state_config_frame_t.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	97863cdaa2	Fail encoder init when CQM file cannot be opened	2017-01-08 19:17:43 +09:00
Arttu Ylä-Outinen	db5e750c7f	Fix --threads=auto When --threads=auto was given on the command line, cfg->threads was actually set to zero, disabling threads altogether. Fixed to set cfg->threads to -1, so that the number of threads is chosen automatically.	2017-01-08 17:58:22 +09:00
Ari Koivula	a9e45efcfc	Add a fast lane for byte-aligned bitstream writes The CABAC engine only writes to the bitstream when it has a full byte. These writes are also always byte-aligned, so there is no need to even check for stream alignment. Speedup was around 3% with ultrafast and low QP.	2016-12-23 17:01:44 +02:00
Jaakko Laitinen	deb63f735f	Fix gop disabling	2016-12-20 14:25:13 +02:00
Ari Lemmetti	70a52f0e48	10-bit: add missing bit depth adjustment to ssd	2016-11-17 19:28:04 +02:00
Ari Koivula	fa078102f1	Fix 32bit compilation Got a warning about implicit cast from uint64_t to void*.	2016-11-17 17:53:57 +02:00
Ari Koivula	5ceec06bd3	Merge pull request #148 from Venti-/crypto Crypto	2016-11-16 21:33:55 +02:00
Ari Lemmetti	c31207ea7d	Optimize intra reference building -Add function with reduced logic for the most common case	2016-11-16 18:28:42 +02:00
Ari Koivula	24f2a23ef8	Remove unnecessary crypto state The frame does not need it's own crypto state, since it always has at least one sub tile.	2016-11-16 13:58:41 +02:00
Ari Koivula	8951e34fd2	Change crypto.h stubs to print instead of assert	2016-11-16 13:58:41 +02:00
Wassim Hamidouche	ea82c38906	correct memory allocation	2016-11-16 12:35:28 +02:00
Wassim Hamidouche	da3e2d1d07	resolve parallel encryption	2016-11-16 12:35:28 +02:00
Ari Koivula	b8a618e666	Fix problems with >8 bit input Enforce bit depth promised by --input-bitdepth to avoid crashes when larger values are provided. Do endianess byte swap for all bytes when the buffer gets extended to multiple of 8 pixels, and not just the number of input pixels. Don't swap bytes on a little-endian system.	2016-11-13 19:58:54 +02:00
Ari Koivula	2c005cda25	Fix bug with sub-pixel motion estimation in tiles The width of the tile was being used to index the frame pixel buffer instead of the width of the buffer.	2016-11-07 15:53:52 +02:00
Ari Koivula	78a28e0338	Reformat --help message - Reduce indentation to 6 spaces - Word wrap everything to under 80 characters - Remove defaults from options covered by presets - Add a dash in front of argument descriptions - Add --(no-) to names of parameters that accept it and remove mention of enabling or disabling - Add executable and scripts as a dependancy to make docs	2016-11-04 15:40:28 +02:00
Ari Koivula	d18de19d8a	Fix DTS and PTS not being passed on through lib API Fixes "cur_dts is invalid" warning from FFmpeg.	2016-10-28 19:05:47 +03:00
Ari Koivula	0c41c2ebd6	Make CLI set PTS for each input picture This value is not represented in the HEVC bitstream, which is why it was not set previously. FFmpeg sets and needs it however, so make the CLI set it as well to make sure we handle it correctly.	2016-10-28 19:03:03 +03:00
Ari Koivula	5bf745460d	Re-categorize options in the help message - Move VUI stuff to the bottom - Merge Parallel processing, WPP, Tiles and slices - Add more categories for the other options	2016-10-27 03:26:15 +03:00
Ari Koivula	cb6672b452	Disable WPP when Tiles are enabled Closes #142.	2016-10-27 02:07:10 +03:00
darealshinji	488d042e5f	Bump KVZ_VERSION	2016-10-25 12:32:13 +02:00
Ari Lemmetti	29153ed503	Remove unused variable	2016-10-21 17:28:42 +03:00
Ari Lemmetti	778e46dfd8	Add AVX2 version of SSD	2016-10-21 15:07:53 +03:00
Ari Lemmetti	6f5d7c9e06	Move SSD to strategies	2016-10-21 15:07:23 +03:00
Ari Lemmetti	89b941eab4	Fix typo	2016-10-21 15:07:02 +03:00
Alexis Ballier	1dcc993743	Include i386 & i486 for compiling intel asm. x86_64-pc-linux-gnu-gcc -m32 that I use for building 32bits libraries on amd64 defines only __i386__.	2016-10-14 18:07:37 +02:00
Arttu Ylä-Outinen	5fb7afe8c4	Add --implicit-rdpcm command line parameter. Makes it possible to use lossless coding without implicit residual DPCM.	2016-10-03 20:01:55 +09:00
Arttu Ylä-Outinen	5affc0f527	Use implicit RDPCM in lossless mode. Sets implicit RDPCM flag in SPS when lossy coding is disabled and applies DPCM to intra residual when prediction mode is horizontal or vertical.	2016-10-03 19:31:38 +09:00
Ari Koivula	016dbe0894	Further refine presets The rd-complexity of slow presets is better with a less agressive GOP. Adding the GOP as part of the preset improved BDRate enough, that it didn't make sense anymore to have a veryslow target the best BDRate. Instead, push that responsibility to placebo by making it a little bit faster.	2016-09-29 17:35:12 +03:00
Ari Koivula	31c5ff0f16	Add cross-platform core number detection Well, turns out pthread_num_processors_np isn't standard so we need to do this crap. Threw in hyper threading detection as a bonus.	2016-09-29 00:03:21 +03:00
Ari Koivula	8c7351eac8	Fix lp-gop with depth 1 GOPs with depth 1 had the same structure as those with depth 2: g4d3t1 = 3 2 3 1 g4d2t1 = 2 2 2 1 g4d1t1 = 2 2 2 1 It now results in the correct: g4d1t1 = 1 1 1 1	2016-09-29 00:03:21 +03:00
Ari Koivula	a395aeaac9	Set default settings to those of --preset=medium	2016-09-29 00:03:21 +03:00
Ari Koivula	4388fe0d30	Set presets to ratedistortion-complexity optimized versions	2016-09-29 00:03:20 +03:00
Ari Koivula	facb1e16df	Use -p64 -q22 and --gop=lp-g4d3t1 by default Coding inter without GOP of any kind really isn't a very sensible default. Defaulting to B-GOP of some kind would be more better, but lp-gop is more robust for now.	2016-09-29 00:03:20 +03:00
Ari Koivula	d7391a9593	Improve default for number of parallel frames	2016-09-29 00:03:20 +03:00
Ari Koivula	19d423ab29	Use all available cores by default	2016-09-29 00:03:20 +03:00
Ari Koivula	3f138f087a	Allow non-gop-length --period for lp-gop	2016-09-29 00:03:19 +03:00
Ari Koivula	16790c9f15	Remove number of references from --gop=lp syntax The number of references should be part of the presets, so gop should be defined separately.	2016-09-29 00:03:19 +03:00
Ari Koivula	cbfa824d1a	Merge branch 'simd'	2016-09-27 20:49:45 +03:00
Ari Koivula	14a7bcba25	Use a faster function for clipped inter SAD Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes for which we don't have AVX versions yet. Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for: --preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp * Suite speed_tests: -PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) -PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)	2016-09-27 20:48:30 +03:00
Arttu Ylä-Outinen	4313e56c2d	Add --no-rdoq-skip command line switch	2016-09-11 17:40:16 +09:00
Ari Koivula	a7a33b08ec	Remove --slice-addresses from usage message And give a warning if it's used. Slices will have to be implemented at some point, but they aren't yet so let's not advertize them.	2016-09-10 21:06:00 +03:00
Eemeli Kallio	f41e428e5f	Removed kvz_skip_unnecessary_rdoq and reworked --rdoq-skip to skip 4x4 blocks when it is on.	2016-09-09 10:26:07 +03:00
Eemeli Kallio	ed9c0b0416	RDOQ reworked in rdo.c. rdoq_signhide now skips coeffs that are after best_last_idx.	2016-09-09 10:16:51 +03:00
Ari Koivula	02cd17b427	Add faster AVX inter SAD for 32x32 and 64x64 Add implementations for these functions that process the image line by line instead of using the 16x16 function to process block by block. The 32x32 is around 30% faster, and 64x64 is around 15% faster, on Haswell. PASS inter_sad: 28.744M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec) PASS inter_sad: 7.882M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec) to PASS inter_sad: 37.828M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec) PASS inter_sad: 9.081M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)	2016-09-01 21:36:39 +03:00
Ari Koivula	d0512d25c6	Use fixed point in get_mvd_coding_cost	2016-08-30 21:37:12 +03:00
Ari Koivula	ec7507a935	Further optimize get_ep_ex_golomb_bitcost Unrolled 16-bit log2 calculation.	2016-08-30 21:37:01 +03:00
Ari Koivula	a4ba794587	Optimize get_ep_ex_golomb_bitcost Arrange the decision tree such that there is only 3 branches on the most common paths and the more likely branch is always fall-through. A profile guided optimization pass would probably do something similar.	2016-08-30 05:24:16 +03:00
Ari Koivula	82cfab58f8	Improve fast mvd coding cost estimation A lot of time is being taken up by this function on ultrafast, and it doesn't do a very good job. This change aims to both simplify the logic and make the estimate better. The logic is simplified by using a look up for the step mvd bit cost step function instead of mimicking the binarization process. The estimation is made better by checking fractional cabac bit costs. The new function returns the same results as kvz_get_mvd_coding_cost_cabac, but is also faster than the old function.	2016-08-30 04:55:09 +03:00
Ari Koivula	d31be8eb27	Make mvd_coding_cost functions take const cabac	2016-08-30 04:46:46 +03:00
Ari Koivula	64d631c174	Fix 8bit to 10bit input conversion regression	2016-08-25 22:09:40 +03:00
Ari Koivula	27789125d8	Fix input bit depth conversion The input was being shifted to the wrong direction.	2016-08-25 22:05:25 +03:00
Ari Koivula	4ec039004b	Add monochrome encoding Write bitstream without chroma when encoding with --input-format=P400. This reduces bitstream size by 0-1 %, compared to coding monochrome in 420 format, and speeds up encoding slightly due to not processing chroma.	2016-08-25 20:15:26 +03:00
Ari Koivula	c5b70cf812	Add chroma format support to yuv_t	2016-08-24 19:20:53 +03:00
Ari Koivula	032ed30ff4	Add chroma format support to kvz_picture Add picture_alloc_csp to libkvz api to allocated pictures with chroma format different from 420.	2016-08-24 19:20:53 +03:00
Ari Koivula	48ccc26839	Add --input-format and --input-bitdepth Adds reading of 10 bit input for 10-bit encoding.	2016-08-24 19:20:53 +03:00
Ari Koivula	cc08073615	Refactor some indexing weirdness in init_lcu_t I thought there might be a bug in this so I cleaned it up.	2016-08-24 19:12:48 +03:00
Ari Koivula	b6d674d66e	Refactor integer vector inter prediction This code was pretty bad, so I cleaned it up a bit.	2016-08-24 19:09:26 +03:00
Ari Lemmetti	28c4174d0e	Fix incorrect shuffle parameters _MM_SHUFFLE uses reverse order	2016-08-23 19:40:46 +03:00
Ari Lemmetti	ce77bfa15b	Replace KVZ_PERMUTE with _MM_SHUFFLE The same exact macro already exists	2016-08-22 19:08:46 +03:00
Jovasa	68eef660bd	Fixed search around mv_in in fullsearch not being saved.	2016-08-19 15:19:29 +03:00
Eemeli Kallio	99d8b9abeb	Changed skip_rdoq name to kvz_skip_unnecessary_rdoq. Changed the order it uses when it goes through CGs and tuned its sum calculation.	2016-08-18 14:02:56 +03:00
Eemeli Kallio	1fb4755f31	Added rdoq-skip to quant-generic.c	2016-08-18 12:17:54 +03:00
Eemeli Kallio	d20ac03ca2	Added --rdoq-skip option	2016-08-18 12:17:53 +03:00
Marko Viitanen	83cf801664	Fixed MV constraint condition in bipred	2016-08-18 08:53:17 +03:00
Marko Viitanen	5ae1c595f2	Fixed slice_temporal_mvp_enabled_flag and disabled TMVP with tiles - slice_temporal_mvp_enabled_flag should be signalled also with non-IDR I-slices	2016-08-10 14:51:41 +03:00
Marko Viitanen	5326519182	TMVP cleanup and const qualifier fixes	2016-08-10 14:10:43 +03:00
Marko Viitanen	f40907260d	Added config parameter for TMVP and cmdline option --no-tmvp - Enabled by default - Cannot be used with GOP at the moment	2016-08-10 14:09:29 +03:00
Marko Viitanen	fd52dac1f7	Fixed TMVP scaling	2016-08-10 14:09:28 +03:00
Marko Viitanen	c664bc8cf7	Added flag collocated_ref_idx to the slice header	2016-08-10 14:09:28 +03:00
Marko Viitanen	c5f2611a38	Fixes for TMVP to work with the new CU array	2016-08-10 14:09:28 +03:00
Marko Viitanen	d85af5755b	TMVP working when only 1 ref frame	2016-08-10 14:09:28 +03:00
Marko Viitanen	39f0165efe	Fix a bug in TMVP, the reference cu_array was being overwritten	2016-08-10 14:09:27 +03:00
Marko Viitanen	adab8c327e	Clean TMVP code	2016-08-10 14:09:20 +03:00
Marko Viitanen	5fa8226ac9	Temporal merge candidate selection	2016-08-10 14:09:20 +03:00
Marko Viitanen	f83042f4a1	Temporal MV candidate selection	2016-08-10 14:09:19 +03:00
Marko Viitanen	f8671581e3	Implemented function kvz_inter_get_temporal_merge_candidates()	2016-08-10 14:09:19 +03:00
Marko Viitanen	2956bdb379	Added flag slice_temporal_mvp_enabled_flag	2016-08-10 14:09:19 +03:00
Arttu Ylä-Outinen	2a946bd88e	Rename encoder_state_t.global to frame "Frame" is more accurate than "global" since when OWF is used, encoder states for each frame have their own struct.	2016-08-10 13:22:36 +09:00
Arttu Ylä-Outinen	5fbb0a8c27	Fix includes	2016-08-10 13:05:40 +09:00
Arttu Ylä-Outinen	aabf6ca3ee	Extract encoding code from encoderstate.c Moves functions kvz_encode_coding_tree and kvz_encode_coeff_nxn from encoderstate.c to encode_coding_tree.c.	2016-08-09 22:16:50 +09:00
Arttu Ylä-Outinen	803f29be8f	Remove reconstructed picture allocation in lossless. Changes encoder_set_source_picture to set the reconstructed picture to a copy of the source picture instead of allocating a new picture when lossless coding is used.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	aaec473a19	Refactor encoder state initialization. - Moves allocation of the reconstructed picture after the source picture is set. - Extracts main state initialization to a separate function from encoder_state_new_frame. - Changes kvz_encoder_feed_frame to return the frame. - Renames some functions to better match their purpose.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	cd7024b3a5	Skip computing SSD when using lossless coding. The SSD is always zero since it is lossless.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	fbbe5d1844	Use kvz_pixels_calc_ssd for SSD in search.c. Replaces loops for computing SSDs by calling kvz_pixels_calc_ssd in search.c.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	22cc97ffb1	Fix missing field initializers.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	06b82bf888	Disable filters, trskip and signhide in lossless. When lossless coding is used, deblock and SAO are skipped, transform skip flag is not written and sign hiding is not used.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	97451ec401	Align assignments in encoder.c.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	1dc94663c3	Bypass transform and quantization with --lossless. When --lossless is given, set cu_transquant_bypass_flag for every CU and bypass transform and quantization by directly copying reference pixels to reconstruction and the residual to coefficients.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	2113b0182d	Enable PPS-level tq bypass flag with --lossless. Sets transquant_bypass_enable_flag to true in PPS when --lossless is given.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	a5897bbece	Make cabac context initialization tables static.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	23e7d9bb37	Add --lossless command line parameter.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	5372ea432f	Update README and manpage.	2016-08-03 14:25:08 +09:00
Ari Lemmetti	6bcba004ff	Comment out to fix unused code error on clang.	2016-07-14 14:12:16 +03:00
Ari Lemmetti	c0979ebdcb	Implement AVX2 luma sampling	2016-07-14 12:53:02 +03:00
Ari Lemmetti	6244560426	Add avx2 strategy for kvz_filter_frac_blocks_luma.	2016-07-14 12:53:02 +03:00
Ari Lemmetti	9c4e9e049b	Load only what is needed. Eliminate latency from hadds.	2016-07-14 12:53:01 +03:00
Ari Lemmetti	7f71cb423a	Check 4 fractional pixel positions simultaneously	2016-07-14 12:52:24 +03:00
Ari Lemmetti	ad445ab8a1	Transition to kvz_filter_frac_blocks_luma	2016-07-14 12:51:02 +03:00
Ari Lemmetti	fccfbd2f28	Add strategy for kvz_filter_frac_blocks_luma	2016-07-14 12:51:02 +03:00
Ari Lemmetti	e9c3074d32	Add buffers and definitions for upcoming filtering Samples are to be filtered in separate blocks instead of making one big picture with interpolated pixels	2016-07-14 12:51:02 +03:00
Ari Lemmetti	7afe7e963b	Use fme_level to control the search accuracy.	2016-07-14 12:51:01 +03:00
Ari Lemmetti	5fa323bf25	Skip searching best hpel twice. Make hpel and qpel loops similar.	2016-07-14 12:51:01 +03:00
Ari Lemmetti	bc98a9affa	Change the search order to suit lighter fme search	2016-07-14 12:51:01 +03:00
Ari Lemmetti	2b0c8db349	Add quad satd for avx2	2016-07-14 12:50:24 +03:00
Ari Lemmetti	0ff69fd6f8	Add any size multi satd	2016-07-14 12:48:37 +03:00
Ari Lemmetti	d17b9e7d6e	Allow subme parameters 0-4 Update usage, presets,defaults,lib version	2016-07-12 19:49:38 +03:00
Arttu Ylä-Outinen	62ad57d0bf	Fix kvz_image_list_add for zero-sized lists. When a list does not have space for the new element, its size is doubled. If the size of the list is zero, it would not be resized. Fixed to always resize the list so that the new element can be added.	2016-06-22 13:35:16 +09:00

... 11 12 13 14 15 ...

3124 commits