hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 10:34:05 +00:00

Author	SHA1	Message	Date
Pauli Oikkonen	c18adc5ee0	Redo sao_band_ddistortion_avx2 Avoid branching and do the entire thing on 32 pixels at once in YMMs. Also make the sao_bands function parameter const.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	2827c3e3ab	Make calc_sao_bands less opaque	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	1bb9a079a8	Fix indentation	2019-08-07 16:35:24 +03:00
Reima Hyvönen	7bc959c7c5	3 sao functions are now working	2019-08-07 16:35:24 +03:00
Reima Hyvönen	0e0f2d3490	made to clear sum vector after it has been set to memory	2019-08-07 16:35:24 +03:00
Reima Hyvönen	f146de7acb	removed some variables to prevent memory losses	2019-08-07 16:35:24 +03:00
Reima Hyvönen	247c3a7a71	conversed gined to unsigned int	2019-08-07 16:35:24 +03:00
Reima Hyvönen	ac5c216974	Some more memory error preventing to sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	3fb1cbca35	more editing sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	afbb6fb960	some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures	2019-08-07 16:35:24 +03:00
Reima Hyvönen	3496a57f7a	Edited sao_edge_ddistortion_avx2 to avoid memory overflow	2019-08-07 16:35:24 +03:00
Reima Hyvönen	267ba1d6ce	Modified sao_band_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	e70663b245	added some sub commands to avoid memory read errors	2019-08-07 16:35:24 +03:00
Reima Hyvönen	59dfb4570c	Converted some loads to load int8_t instead ints	2019-08-07 16:35:24 +03:00
Reima Hyvönen	8b253209a8	Found false address load from calc_sao_edge_dir. Should now work like generic	2019-08-07 16:35:24 +03:00
Reima Hyvönen	50e0a47b7a	Took away __restrict	2019-08-07 16:35:24 +03:00
Reima Hyvönen	8a39eb674e	Removed c-variable from calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	bc0a36830d	Clerified some 6 pixel loads	2019-08-07 16:35:24 +03:00
Reima Hyvönen	1a8b211e05	Added break to line 170	2019-08-07 16:35:24 +03:00
Reima Hyvönen	d05e750ebe	Added some switches to prevent segmentation fault from reading	2019-08-07 16:35:24 +03:00
Reima Hyvönen	203580047d	Defined some AVX functions	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c884c738b1	Updated some commands to match the standard	2019-08-07 16:35:24 +03:00
Reima Hyvönen	b412ed2f59	Removed some setr and used loads calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c6cc063534	converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract	2019-08-07 16:35:24 +03:00
Reima Hyvönen	47ac109b10	optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND	2019-08-07 16:35:24 +03:00
Reima Hyvönen	96dc60a1ed	first working optimation	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c148aff9fb	Some optimation done to function sao_reconstruct_color_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	bf16ba6cc4	Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	79dc39a676	Some editing for sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	06ee52924e	some reconst done to calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	5fbc65d823	reconst optimation doesn't work yet	2019-08-07 16:35:24 +03:00
Reima Hyvönen	d29f834a69	Remove useless function	2019-08-07 16:35:24 +03:00
Reima Hyvönen	a232a12160	calc_sao_edge_dir_avx2 updated	2019-08-07 16:35:24 +03:00
Reima Hyvönen	b1febc02a5	sao_edge_ddistortion_avx2 now working proberly	2019-08-07 16:35:24 +03:00
Reima Hyvönen	cd6092a1ec	Still too much bits, looking for where they appear	2019-08-07 16:35:24 +03:00
Reima Hyvönen	7853be8eeb	Incomple optimation	2019-08-07 16:35:24 +03:00
Ari Lemmetti	40609aa865	Add missing headers to Makefile.am	2019-07-12 19:15:51 +03:00
Ari Lemmetti	5db3a78499	Bump versions for release 1.3	2019-07-09 22:09:32 +03:00
Ari Lemmetti	d513ab1999	Add missing newline	2019-07-09 21:06:05 +03:00
Ari Lemmetti	4967072625	Do not bypass search on skip cu if early_skip is not enabled	2019-07-09 20:20:12 +03:00
Ari Lemmetti	b20992a9f3	Rename functions more descriptive	2019-07-09 20:20:11 +03:00
Ari Lemmetti	a348a0ec23	Fix transform depth in early skip	2019-07-09 20:05:48 +03:00
Pauli Oikkonen	8d48bee180	Tidy fast coeff cost code	2019-07-09 18:01:54 +03:00
Pauli Oikkonen	201a43b08e	Clean up the RD-estimation code	2019-07-09 18:01:54 +03:00
Pauli Oikkonen	b111df5073	Create preliminary version of improved cost estimator	2019-07-09 18:01:54 +03:00
Ari Lemmetti	be08a87d94	Add missing parameter max-merge to the help message	2019-07-09 16:28:46 +03:00
Ari Lemmetti	d0bb9b4a6d	Add parameter max-merge to presets	2019-07-09 16:26:03 +03:00
Ari Lemmetti	4097331fd6	Early skip	2019-07-09 15:59:31 +03:00
Joose Sainio	977e885ea2	Fix issue with gop=0 introduced in `1c36f68d0c`	2019-07-05 12:57:27 +03:00
Mikko Pitkänen	a7f09c8114	Merge branch 'threadwrapper'	2019-06-24 16:54:59 +03:00
Mikko Pitkänen	3dd606ce2e	Add new threadwrapper	2019-06-18 18:45:45 +03:00
Joose Sainio	c94077d15e	remove hardcoded value	2019-06-12 14:37:41 +03:00
Joose Sainio	ac68c8444d	remove negation that wasn't supposed to be there	2019-06-12 14:35:24 +03:00
Joose Sainio	5851dcc3be	missing negation	2019-06-12 14:08:18 +03:00
Joose Sainio	1c36f68d0c	Fix owf>=9 gop=8 and add test to catch such problem in future	2019-06-12 14:04:41 +03:00
Ari Lemmetti	933ff6ed55	Merge branch 'set-qp-in-cu-fix'	2019-06-07 09:01:03 +03:00
Ari Lemmetti	c6da839002	Set lcu sqrt lambda according to lcu lambda instead of frame lambda when ROI is used	2019-05-29 18:32:10 +03:00
Ari Lemmetti	9339845e8b	Set QP completely at CU level as the name '--set-qp-in-cu' implies -Move slice delta QP to CU level when using --set-qp-in-cu -Separate functionality from roi	2019-05-24 20:38:39 +03:00
Pauli Oikkonen	081d16fc33	Fix intrinsics that may be missing on some systems Create a header to collect all the workarounds for missing intrinsics in one place	2019-05-23 19:59:40 +03:00
Pauli Oikkonen	87a9208db8	Eliminate cvtsi64_si128 intrinsic Apparently it'll cause Win32 builds to break because it emits the movq instruction or something..	2019-04-17 16:30:40 +03:00
Pauli Oikkonen	7175d20bb2	Still include stdint.h for non-vector builds	2019-04-15 19:36:01 +03:00
Pauli Oikkonen	1315c7e2b0	Do not compile any vector code for non-SSE4/AVX2 builds	2019-04-15 19:10:48 +03:00
Pauli Oikkonen	f5f70e7bc5	Merge branch 'sad-optimization'	2019-04-15 19:02:01 +03:00
Jan Beich	85f46e17a9	Detect AltiVec via elf_aux_info() on FreeBSD 12+	2019-04-01 13:08:04 +00:00
Jan Beich	82486255da	Simplify AltiVec detection on Linux	2019-04-01 13:08:04 +00:00
Pauli Oikkonen	6d43759604	Create a border-respecting 32-wide AVX hor_sad	2019-03-07 18:01:22 +02:00
Pauli Oikkonen	f218cecb38	Remove offending hor_sad_avx2_w32 function Consider possibly creating a non-offending AVX2 version instead, the way hor_sad_sse41_w32 works. Or maybe there's more essential work to do.	2019-03-05 22:51:41 +02:00
Pauli Oikkonen	df2e6c54fd	4-unroll hor_sad_sse41_arbitrary This may not increase perf though because it's so rarely used function, so keeping icache footprint may be more essential...	2019-03-05 22:45:23 +02:00
Pauli Oikkonen	448eacba7b	Avoid overreading block borders in hor_sad_sse41_arbitrary	2019-03-05 22:34:50 +02:00
Eemeli Kallio	c159e275b7	Merge branch 'max_merge'	2019-03-05 14:39:03 +02:00
Pauli Oikkonen	41f51c08c4	Avoid overrunning buffer in hor_sad_sse41_w32	2019-03-01 15:37:38 +02:00
Pauli Oikkonen	bcd9879359	Include quant coeff range check in non-scaling list execution path too	2019-02-27 17:26:44 +02:00
Pauli Oikkonen	24e6363f64	Remove the kvz_quant_avx2 wrapper function	2019-02-27 16:32:58 +02:00
Pauli Oikkonen	748820f3c5	Eliminate unnecessary loading of coeffs if scaling lists are off	2019-02-27 16:26:35 +02:00
Pauli Oikkonen	5994350f40	Allow quant_flat_avx2 to be used with scaling lists on	2019-02-27 16:25:59 +02:00
Eemeli Kallio	7f4e0acf41	Added check if max-merge is out of bounds	2019-02-19 13:53:42 +02:00
Pauli Oikkonen	9b0e079262	Use SSE instructions for 64-bit SADs instead of MMX VC++ seems to choke on MMX instructions	2019-02-18 20:13:33 +02:00
Pauli Oikkonen	d8b8923028	Add LGPL notices to reg_sad headers	2019-02-18 17:52:47 +02:00
Eemeli Kallio	2a40560888	some variables to const	2019-02-12 11:24:10 +02:00
Eemeli Kallio	8f8e7bb53c	Added possibility to reduce number of maximum number of merge candidates.	2019-02-12 09:21:03 +02:00
Pauli Oikkonen	770db825b9	Create hor_sad_w8 and w4 epol mask the way w16 works	2019-02-06 19:34:26 +02:00
Pauli Oikkonen	aa19bcac8a	Avoid branching in creating shuffle mask in hor_sad_w16	2019-02-06 18:58:46 +02:00
Pauli Oikkonen	2d05ca8520	Remove width from constant-width hor_sad func params They should kinda know it already	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	57db234d95	Move 32-wide SSE4.1 hor_sad to picture-sse41.c It's not used by picture-avx2.c that also includes the header, so it should not be in the header	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	dd7d989a39	Implement 32-wide hor_sad on AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ff70c8a5ec	Utilize horizontal SAD functions for SSE4.1 as well	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f5ff4db01f	4-wide hor_sad border agnostic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	35e7f9a700	Fix hor_sad w8 to work with both borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	836783dd6e	Use hor_sad_w32 for both left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	69687c8d24	Modify hor_sad_sse41_w16 to work over left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	51c2abe99a	Modify image_interpolated_sad to use kvz_hor_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	1e0eb1af30	Add generic strategy for hor_sad'ing an non-split width block	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	686fb2c957	Unroll arbitrary-width SSE4.1 hor_sad by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	768203a2de	First version of arbitrary-width SSE4.1 hor_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ccf683b9b6	Start work on left and right border aware hor_sad Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point investigate if this can start to thrash icache	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	760bd0397d	Pad the image buffer by 64 bytes from both ends This will be necessary for an efficient and straightforward implementation of hor_sad for blocks over 16 pixels wide, because they cannot use the shuffle trick because inter-lane shuffling is so hard to do	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	c36482a11a	Fix bug in 24-wide SAD facepalm	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f781dc31f0	Create strategy for ver_sad Easy to vectorize	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ca94ae9529	Handle extrapolated blocks with unmodified width using optimized_sad pointer	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91b30c7064	Tidy up kvz_image_calc_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	9db0a1bcda	Create get_optimized_sad func for SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91380729b1	Add generic get_optimized_sad implementation NOTE: To force generic SAD implementation on devices supporting vectorized variants, you now have to override both get_optimized_sad and reg_sad to generic (only overriding get_optimized_sad on AVX2 hardware would just run all SAD blocks through reg_sad_avx2). Let's see if there's a more sensible way to do it, but it's not trivial.	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	45f36645a6	Move choosing of tailored SAD function higher up the calling chain	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91cb0fbd45	Create strategy for directly obtaining pointer to constant-width SAD function	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	94035be342	Unify unrolling naming conventions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	517a4338f6	Unroll SSE SAD for 8-wide blocks to process 4 lines at once	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	0f665b28f6	Unroll arbitrary width SSE4.1 SAD by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	cbca3347b5	Unroll 64-wide AVX2 SAD by 2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	84cf771dea	Unroll 32 and 16 wide SAD vector implementations by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	5df5c5f8a4	Cast all pointers to const types in vector SAD funcs Also tidy up the pointer arithmetic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a711ce3df5	Inline fixed width vectorized SAD functions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	6504145cce	Remove 16-pixel wide AVX2 SAD implementation At least on Skylake, it's noticeably slower than the very simple version using SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4cb371184b	Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	796568d9cc	Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4d45d828fa	Use constant-width SSE4.1 SAD funcs for AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	2eaa7bc9d2	Move SSE4.1 SAD functions to separate header	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	d2db0086e1	Create constant width SAD versions for 8 and 16 pixels	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a13fc51003	Include a blank AVX2 strategy registration function even in non-AVX2 builds	2019-02-04 19:52:24 +02:00
Pauli Oikkonen	d55414db66	Only build AVX2 coeff encoding when supported ..whoops	2019-02-04 19:34:30 +02:00
Pauli Oikkonen	3fe2f29456	Merge branch 'encode-coeffs-avx2'	2019-02-04 18:52:31 +02:00
Pauli Oikkonen	722b738888	Fix more naming issues	2019-02-04 16:05:43 +02:00
Pauli Oikkonen	e26d98fb75	Rename a couple variables and add crucial comments	2019-02-04 15:57:07 +02:00
Pauli Oikkonen	f186455619	Move encode_last_significant_xy out of strategy modules It's the exact same in both AVX2 and generic, and does not seem to be worth even trying to vectorize	2019-02-04 14:55:41 +02:00
Pauli Oikkonen	3f7340c932	Fine-tune pack_16x16b_to_16x2b Avoid mm_set1 operation when it's possible to create the constant with one bit-shift operation from another instead. Thanks Intel for 3-operand instruction encoding!	2019-02-04 14:44:47 +02:00
Pauli Oikkonen	314f5b0e1f	Rename 16x2b cmpgt function, comment it better, optimize it slightly Eliminate an unnecessary bit masking to make it even more messy	2019-02-04 14:44:32 +02:00
Pauli Oikkonen	d8ff6a6459	Fix _andn_u32 to work on old Visual Studio	2019-02-01 15:34:42 +02:00
Pauli Oikkonen	26e1b2c783	Use (u)int32_t instead of (unsigned) int in reg_sad_sse41	2019-01-10 14:37:04 +02:00
Pauli Oikkonen	3a1f2eb752	Prefer SSE4.1 implementation of SAD over AVX2 It seems that the 128-bit wide version consistently outperforms the 256-bit one	2019-01-10 13:48:55 +02:00
Pauli Oikkonen	9b24d81c6a	Use SSE instead of AVX for small widths Highly dubious if this will help performance at all	2019-01-07 20:12:13 +02:00
Pauli Oikkonen	b2176bf72a	Optimize SSE4.1 version of SAD Make it use the same vblend trick as AVX2. Interestingly, on my test setup this seems to be faster than the same code using 256-bit AVX vectors.	2019-01-07 19:40:57 +02:00
Pauli Oikkonen	887d7700a8	Modify AVX2 SAD to mask data by byte granularity in AVX registers Avoids using any SAD calculations narrower than 256 bits, and simplifies the code. Also improves execution speed	2019-01-07 18:53:15 +02:00
Pauli Oikkonen	7585f79a71	AVX2-ize SAD calculation Performance is no better than SSE though	2019-01-07 16:26:24 +02:00
Pauli Oikkonen	ab3dc58df6	Copy SAD SSE4.1 impl to AVX2	2019-01-03 18:31:57 +02:00
Pauli Oikkonen	45ac6e6d03	Tidy pack_16x16b_to_16x2b comments	2019-01-03 16:37:05 +02:00
Ari Lemmetti	cd818db724	Add missing quantization and residual in cost calculation (inter rd=2).	2018-12-21 15:55:29 +02:00
Pauli Oikkonen	016eb014ad	Move packing 16x16b -> 16x2b into separate function	2018-12-20 10:51:44 +02:00
Ari Lemmetti	b234897e8a	Fix smp and amp blocks in fme and revert previous change. Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc. Calculate SATD on the 8x4, ... part	2018-12-19 21:30:53 +02:00
Pauli Oikkonen	9aaa6f260d	Fixes to enable portability	2018-12-18 20:42:09 +02:00
Pauli Oikkonen	2fdbbe9730	Move CG reordering code from quant-avx2 to shared header	2018-12-18 19:42:18 +02:00
Pauli Oikkonen	d02207306d	Create a header file for shared AVX2 code	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	361bf0c7db	Precompute >=2 coeff encoding loop with 2-bit arithmetic Who needs 16x16b vectors when you can do practically the same with 16x2b pseudovectors in 32-bit general purpose registers!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	940b0e9e6a	Require BMI2 for AVX2 build Any processor implementing AVX2 should also implement BMI2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	f66cb23d5b	Optimize greater1 encoding loop Calculating the c1 variable need not be a serial operation!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	8c8b791c35	Vectorize kvz_context_get_sig_ctx_inc	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	033261eb74	Eliminate two branches using bit magic	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c4434e8d04	Scan CG's in forward order to simplify finding last significant	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	efd097f5a5	Vectorize the coeff group loop to some extent	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	a01362e638	use the efficient method of reordering raster->scan	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	50a888e789	Use the efficient method to find first and last nz coeffs in block	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	7e9203f566	Scan coeff groups in scan order to help find last significant one	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	9a5a6fdbc7	Simplify two ifs in encode_coeff_nxn-avx2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	37a2a8bac8	See if loop can be optimized by rearranging	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	584f2f74b6	Vectorize significant coeff group scanning loop	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	1bfed73221	Add AVX2 strategy for encode_coding_tree	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c3a6f3112a	Add generic strategy group for encode_coding_tree	2018-12-18 19:41:09 +02:00
Marko Viitanen	1ef851ab4b	Disable FME on amp/smp blocks with width or height not divisible by 8	2018-12-18 10:28:21 +02:00
Joose Sainio	b71c5573f0	Merge branch 'rate_control_fix'	2018-12-17 12:39:27 +02:00
Sergei Trofimovich	68a70e45a1	x86 asm: mark stack as non-executable Gentoo's `scanelf` QA tool detects writable/executable stack of assembly-writtent files as: ``` $ scanelf -qRa . 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-satd.o ``` Normally C compiler emits non-executable stack marking (or GNU assembler via `-Wa,--noexecstack`). The change adds non-executable stack marking for yasm-based assmbly files. https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>	2018-12-16 11:31:56 +00:00
Reima Hyvönen	1fcc5c6a8d	Merge branch 'bipred_recon'	2018-12-11 09:59:35 +02:00
Reima Hyvönen	e4a10880f3	Added case 12 to bipred_recon no mov	2018-12-11 09:52:17 +02:00
Marko Viitanen	a4f3968e52	Fix Visual Studio errors by initializing some variables used in AVX2 signhiding	2018-12-11 09:33:26 +02:00
Ari Lemmetti	ac943147e3	Calculate satd cost for whole non-square blocks as well.	2018-12-10 17:04:29 +02:00
Pauli Oikkonen	c465578048	Add a descriptive comment to coefficient reordering	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	f78bf2ebcb	Optimize q_coefs usage for indexed fetch	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	d9591f1b49	Eliminate midway buffering of reordered coefs TODO: For some mysterious reason seems slightly slower than the buffered one	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7fe454c51f	Optimize get_cheapest_alternative()	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	6bbd3e5a44	Optimize rearrange_512 function	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	cb8209d1b3	Vectorize transform coefficient reordering loop	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7cf4c7ae5f	Rename "reduce" functions to hsum That's what the functions fundamendally do anyway	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	316cd8a846	Fix ALIGNED keyword and grow alignment to 64B	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	1befc69a4c	Implement sign bit hiding in AVX2	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	c5cd03497e	Require BMI and ABM instruction sets for AVX2 build AVX2 support on a processor should always imply BMI and ABM support. The lzcnt and tzcnt instructions have more suitable semantics in the corner case that source word is 0, and allow us to even handle that scenario without a branch. Apparently Visual Studio will already include this support when building with AVX2 enabled, so only the automake files need to be tweaked.	2018-12-03 15:36:32 +02:00
Reima Hyvönen	f8696b54a4	Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)	2018-11-20 17:09:19 +02:00
Marko Viitanen	a5a10a33c3	Enable --scaling-list parameter and add to the documentation	2018-11-19 10:47:30 +02:00
Reima Hyvönen	710ba288db	Chroma has some problems	2018-11-15 16:42:48 +02:00
Sami Ahovainio	8f98d4aac7	Added square search	2018-11-14 14:50:31 +02:00
Marko Viitanen	6871490dd5	Simplify get_mvd_coding_cost(), only include golomb coding	2018-11-14 14:33:31 +02:00
Ari Lemmetti	a832206bb6	Replace 32-bit incompatible instrinsics	2018-11-12 18:54:33 +02:00
Ari Lemmetti	5c774c4105	Rewrite most of FME and interpolation filters Changes had to break a lot of stuff and were just squashed into this horrible code dump	2018-11-08 20:21:16 +02:00
Joose Sainio	1c8a1f24e2	Don't assume anything about bits spent	2018-11-07 16:03:38 +02:00
Joose Sainio	3471e2470d	Fix using uninitialized value for the first frame	2018-11-07 08:17:39 +02:00
Joose Sainio	d95ac11a3b	Fix rate_control for other LP-GOPS	2018-11-06 14:20:44 +02:00
Joose Sainio	67a6ba667e	Fix rate control for flat lp-gop	2018-11-06 09:38:17 +02:00
Reima Hyvönen	7406c33a42	Some more cleaning	2018-10-26 12:25:18 +03:00
Reima Hyvönen	4c71546b2e	Cleaned some coding	2018-10-26 12:19:44 +03:00
Reima Hyvönen	4fe3909e48	Switched luma to use 32bits size ints intstead of 16bit size	2018-10-24 18:24:46 +03:00
Eemeli Kallio	284e73839e	Calculating zero cost moved to its own function	2018-10-16 11:02:01 +03:00
Reima Hyvönen	381e786e10	Trying to find the bug in luma	2018-10-11 18:08:41 +03:00
Marko Viitanen	c589e5ed36	Fix closed-gop frame feed, the ordering was incorrect after the first GOP	2018-10-10 11:12:03 +03:00
Reima Hyvönen	2f5f81bac3	removed the non-optimated bipred function	2018-10-09 11:19:23 +03:00
Marko Viitanen	75dce4f3ce	Fix low-delay-gop usage with --no-open-gop	2018-10-04 15:16:02 +03:00
Marko Viitanen	de71b58f76	Change closed GOP structure to include an additional IDR between GOPs	2018-10-04 11:17:03 +03:00
Reima Hyvönen	212a8e68fa	Modified to avoid memory overflow, still some bug inside luma	2018-10-02 20:23:32 +03:00
Marko Viitanen	954f07e3d7	Add --(no-)open-gop option	2018-10-02 10:05:32 +03:00
Marko Viitanen	8bef85e056	Merge branch 'set-qp-in-cu'	2018-09-03 08:33:33 +03:00
Ari Lemmetti	2fdcc2b79d	Add option --set-qp-in-cu	2018-09-03 08:32:45 +03:00
Reima Hyvönen	896034b7cf	Some renamed functions back	2018-08-28 15:31:10 +03:00
Reima Hyvönen	e8b5e6db4c	Did some merging	2018-08-28 15:26:27 +03:00
Reima Hyvönen	7de5c74434	Updated bipred_recon to work faster	2018-08-28 15:12:31 +03:00
Reima Hyvönen	47b357cca2	Comment one test	2018-08-27 18:52:14 +03:00
Reima Hyvönen	2ca99a44e8	Updated shuffle operation to be in right order	2018-08-27 18:16:38 +03:00
Marko Viitanen	b85ae3688e	Signal QP in slice header if tiles and slices=tiles are enabled Keeps the PPS constant for various purposes	2018-08-16 08:44:39 +03:00
Reima Hyvönen	508b218a12	some modifications made to prevent reading too much	2018-08-14 10:50:39 +03:00
Reima Hyvönen	1d935ee888	some useless stuff removed	2018-08-13 16:47:11 +03:00
Reima Hyvönen	ce3ac4c05e	some modifications to no_mov	2018-08-13 16:41:02 +03:00
Reima Hyvönen	15a613ae94	test if no_mov breaks testing	2018-08-13 16:02:56 +03:00
Reima Hyvönen	97a2049e58	removed pointer declaration out from switch	2018-08-10 16:42:26 +03:00
Reima Hyvönen	aa94bcedbc	Stream is now pointer	2018-08-10 16:38:49 +03:00
Reima Hyvönen	fa5b227ece	256 to 32 doesn't work, made them by hand	2018-08-10 16:01:20 +03:00
Reima Hyvönen	408dedbcc8	removed _mm256_extract_epi8 and replaced with _mm_stream	2018-08-10 15:53:26 +03:00
Reima Hyvönen	31c35091c6	_mm256_cvtsi256_si32 removed	2018-08-10 10:06:40 +03:00
Reima Hyvönen	99dc43074f	_mm256_cvtsi256_si32 breaks system, too much bits. back to extract	2018-08-10 09:59:33 +03:00
Reima Hyvönen	4f1f80b2cb	Transformed convert from 256 to cast 256 -> 128 and then convert from 128	2018-08-09 15:35:54 +03:00
Reima Hyvönen	4957555eb3	Removed leftover from 939	2018-08-09 15:25:03 +03:00
Reima Hyvönen	28b165c971	Clearified some sections, added _MM_SHUFFLE macro	2018-08-09 15:23:01 +03:00
Reima Hyvönen	dd04df8667	testing if error in both avx2 functions	2018-08-03 11:49:00 +03:00
Reima Hyvönen	ed50d71fde	Switched some variables to different location, altered inter_recon_bipred_avx2 function	2018-08-02 16:08:59 +03:00
Reima Hyvönen	f5739a0028	Renaming and removing useless prints	2018-08-02 14:47:17 +03:00
Reima Hyvönen	bc09f59bb6	Edited some definitions	2018-08-02 11:54:53 +03:00
Arttu Ylä-Outinen	83555c3d6d	Enable --fast-residual-cost with fastest presets	2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen	c438bb4a19	Add an option to skip CABAC for residual costs Adds command line option --fast-residual-cost=<limit>. When QP is below the limit, estimates the cost of coding the residual coefficients from the sum of absolute coefficients. Skipping CABAC is not worth it with high QPs because there are fewer coefficients so CABAC is not as slow.	2018-07-16 12:31:20 +03:00
Reima Hyvönen	a4bf77f208	Tested some extract functions	2018-07-12 09:29:32 +03:00
Reima Hyvönen	c05033a893	Even more useless vectors removed	2018-07-11 15:09:14 +03:00
Reima Hyvönen	884cb77238	Removed some not used vectors	2018-07-11 15:06:11 +03:00
Reima Hyvönen	792689a5ff	Removed for-loops, added extract instead	2018-07-11 14:56:41 +03:00
Reima Hyvönen	f9c7f6ee66	Added some break-operations for avx2 optimation	2018-07-11 14:15:38 +03:00
Reima Hyvönen	cc064da143	some more optimation for bipred	2018-07-11 11:27:54 +03:00
Reima Hyvönen	9a339eef89	Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD # Conflicts: # build/kvazaar_lib/kvazaar_lib.vcxproj	2018-07-10 16:21:04 +03:00
Reima Hyvönen	a22cf03ddb	Updated to have no movement function to avx2 strategies	2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen	b7474eb532	Fix SAO buffer sizes Increases sizes of buffers used for SAO reconstruction to avoid stack buffer overflow in AVX2 SAO reconstruction.	2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen	b37470e80f	Merge pull request #207 from jbeich/maltivec Unbreak build on PowerPC if AltiVec isn't supported	2018-07-04 11:06:41 +03:00
Reima Hyvönen	ea83ae45f0	Toimiva ratkaisu	2018-07-03 11:18:51 +03:00
Jan Beich	4f4bea7496	Check -maltivec is supported before using PowerPC target may lack or have non-standard FPU: $ cc -dumpmachine powerpcspe-undermydesk-freebsd $ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist	2018-07-02 23:25:23 +00:00
Jan Beich	b892d820f8	Clean up macOS includes on powerpc* after `93e1c9f1c3` strategyselector.c:426:25: machine/cpu.h: No such file or directory	2018-07-02 21:52:45 +00:00
Reima Hyvönen	17babfffa4	25.6 working optimation, ~50% faster than original	2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen	2f995f4325	Merge pull request #205 from jbeich/powerpc Unbreak build on non-Linux powerpc*	2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen	c1398ef818	Permit --period=1 with any GOP structure All intra coding is a special case so it can be permitted even though Kvazaar normally only supports intra periods that are divisible by the GOP length.	2018-06-18 12:26:11 +03:00
Arttu Ylä-Outinen	abdebe0bf9	Fix --owf help message The number of parallel frames is --owf plus one, not --owf minus one. Fixes #204.	2018-06-18 09:33:36 +03:00
Jan Beich	93e1c9f1c3	Add AltiVec detection for BSDs strategyselector.c:377:26: linux/auxvec.h: No such file or directory	2018-06-17 15:38:24 +00:00
Miika Metsoila	98972d26c2	Document that the high tier requires level 4 or higher	2018-06-14 12:41:03 +03:00
Miika Metsoila	62b44efaa4	Write the encoding tier (main/high) into the bitstream	2018-06-14 12:41:03 +03:00
Arttu Ylä-Outinen	a343f6d587	Prepare for delta QPs at CU-level - Replaces lcu_dqp_enabled with max_qp_delta_depth in encoder_control_t. - Fixes set_cu_qps so that it can handle quantization groups of arbitrary size. - Fixes computation of QP predictors so that it works for quantization groups of arbitrary size.	2018-06-13 15:36:19 +03:00
Arttu Ylä-Outinen	dc6b2024ea	Modify reference count asserts to fix data races Changes asserts on the reference count of objects to assert the value after KVZ_ATOMIC_INC instead of directly checking the value. Fixes some data races detected by TSan.	2018-06-12 09:35:07 +03:00
Ari Lemmetti	4fb1c16c61	Add early termination for intra rdo when a zero coefficient block is found.	2018-06-08 21:03:07 +03:00
Ari Lemmetti	492529fb7a	Add the same comment to help message as well...	2018-05-30 14:13:15 +03:00
Ari Lemmetti	0d5972bf03	Add missing sort to intra transform split search so mode at 0 is the best	2018-05-21 13:10:38 +03:00
Sebastien Alaiwan	954bca7d6e	Fix memset parameter	2018-05-17 11:24:49 +02:00
Jaakko Laitinen	f9466efcbb	Close file on error	2018-05-15 11:50:16 +03:00
Reima Hyvönen	9fed29f950	optimation for inter_recon_bipred	2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen	5c585c4fbc	Update help message Updates the default option values to match the medium preset.	2018-04-03 10:40:37 +03:00
Arttu Ylä-Outinen	2b4e22111a	Update presets The new presets are slower but have better coding efficiency.	2018-04-03 10:37:30 +03:00
Arttu Ylä-Outinen	7185519a1b	Update command line help - Adds missing default values. - Adds help for --crypto and --key. - Adds help for --rd=3. - Adds help for --sao options. - Some changes to help wording.	2018-03-23 14:33:04 +02:00
Arttu Ylä-Outinen	3606860504	Add --no-cpuid option Equivalent to --cpuid=0.	2018-03-23 12:32:27 +02:00
Arttu Ylä-Outinen	fb462b25ef	Fix transform skip for inter The transform skip flag in cu_info_t was stored under the intra substruct even though transform skip can be used for inter as well. This caused bitstream errors. Fixed by moving the flag out of the substruct.	2018-03-20 11:01:33 +02:00
Arttu Ylä-Outinen	b64e46707d	Skip raster scan step in TZ search Raster scan is very slow and the BD-rate improvement is marginal.	2018-03-01 14:04:03 +02:00
Arttu Ylä-Outinen	6877064230	Add zero neighborhood check to TZ search Adds an additional grid search step that starts from the zero motion vector after the normal grid search. The search range for this step is half of the normal range.	2018-03-01 14:02:13 +02:00
Arttu Ylä-Outinen	74a413c46a	Switch to star refinement in TZ search	2018-03-01 13:06:14 +02:00
Arttu Ylä-Outinen	ebee428ee1	Add loop termination to TZ grid search Terminates the grid search if no better motion vector was found in the last three iterations.	2018-03-01 13:06:06 +02:00
Arttu Ylä-Outinen	4c175621dd	Fix TZ grid search and star refinement - Changes TZ grid search and star refinement to keep the origin constant instead of moving to the best position after each iteration. - Changes star refinement to loop until there is no more improvement, instead of running the step only once.	2018-03-01 12:56:57 +02:00
Arttu Ylä-Outinen	9c2d0074a2	Add rounding of motion vectors in inter search When the starting point for integer motion estimation was selected among the merge candidates, the candidate motion vectors were always rounded down. This commit changes the rounding so that they are rounded to the nearest integer MV instead.	2018-03-01 09:39:21 +02:00
Ari Lemmetti	662430d441	Select CU type based on SSD, transform unit tree and mode cost of luma and chroma on --rd=2	2018-02-22 19:26:48 +02:00
Arttu Ylä-Outinen	cb06cfeadb	Drop temporary arrays in bipred search Changes bipred search to use the original source and reconstruction arrays directly instead of copying them.	2018-02-14 11:20:51 +02:00
Arttu Ylä-Outinen	0ea516ba30	Move bipred search to a separate function	2018-02-14 09:56:53 +02:00
Arttu Ylä-Outinen	6f506be12d	Drop dynamic allocation from bipred search Moves the temporary LCU struct used in bipred search from the heap to the stack. The single malloc call was a huge bottleneck in bipred.	2018-02-14 09:55:02 +02:00
Arttu Ylä-Outinen	7155dd0db7	Add negative references to L1 list Changes reference index list creation so that the negative references are added to L1 in addition to L0 when biprediction is enabled and no reordering of pictures is done. Biprediction can now be used with the low-delay GOP structure.	2018-02-07 14:54:52 +02:00
Arttu Ylä-Outinen	4b24cd03a2	Update for crypto++ 6.0.0 compatibility Changes the crypto module to use unsigned char instead of byte. The byte typedef is no longer included in the global namespace in crypto++ 6.0.0. See https://github.com/weidai11/cryptopp/issues/442. Fixes #184.	2018-02-05 13:35:03 +02:00
Arttu Ylä-Outinen	8c53417006	Check zero coefficient cost for inter Checks the cost of flushing all coefficients of an inter block to zero. This is much faster than doing full RDOQ but can still reduce bitrate significantly. Encoding speed is increased since fewer coefficient bits have to be coded with CABAC.	2018-01-29 12:41:56 +02:00
Arttu Ylä-Outinen	018b5ffa64	Move inter CU reconstruction to a new function Moves code for reconstructing all PUs in an inter CU to a new function kvz_inter_recon_cu in inter.c.	2018-01-24 15:05:39 +02:00
Arttu Ylä-Outinen	405b8c1069	Refactor inter MVD cost functions Moves duplicate code for writing the MVD of a single motion vector from kvz_get_mvd_coding_cost_cabac and encoder_inter_prediction_unit to a new function.	2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen	c1cca1ad7f	Refactor inter MV candidate selection Moves duplicate code for checking the best MV candidate from functions calc_mvd_cost, search_pu_inter_ref and search_pu_inter to a new function.	2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen	9067aa4535	Remove an unnecessary copy in SMP/AMP search SMP/AMP search is performed using a lower work tree level than the normal inter search so the prediction info must be copied up if an SMP/AMP mode is chosen. Previously pixels and coefficient were copied as well. Changed to only copy prediction info.	2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen	89a930d6dd	Add part mode bitcost when using SMP/AMP blocks	2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen	fc43643ba5	Use a transform split for SMP and AMP blocks	2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen	c74ede148b	Fix CBF flags for 4x4 luma blocks CBF flags were not being propagated to the upper level from blocks of size 4x4.	2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen	0a69e6d18f	Fix selection of transform function for 4x4 blocks DST function was returned for inter luma transform blocks of size 4x4 even though they must use DCT. Fixed by checking the prediction mode of the block in addition to whether it is chroma or luma.	2018-01-18 10:36:25 +02:00
Miika Metsoila	bcedfd6669	Remove the usage of errno in me-steps argument parsing	2018-01-16 14:38:43 +02:00
Miika Metsoila	39ed36830e	Merge branch 'me_steps'	2018-01-16 14:22:59 +02:00
Miika Metsoila	61213e3ad9	Improve step parameter parsing and usage	2018-01-10 15:16:52 +02:00
Arttu Ylä-Outinen	649113a821	Fix inter search being used for 4x4 blocks When 4x4 intra blocks are enabled and inter search is limited to 16x16 and larger blocks, it is possible that inter search is accidentally done for 4x4 blocks. Fixed by checking that block size is at least 8x8 before doing inter search.	2018-01-10 14:21:48 +02:00
Miika Metsoila	e8e0e7596a	Add a step-cutoff parameter for motion estimation search	2017-12-22 14:04:25 +02:00
Miika Metsoila	4e13608b01	Merge branch 'diamond_search'	2017-12-18 14:11:53 +02:00
Miika Metsoila	2cde0d1a18	Document diamond search option	2017-12-12 14:45:01 +02:00
Miika Metsoila	b923b63b42	Add diamond search	2017-12-12 14:40:14 +02:00
Ari Lemmetti	14892fda00	Replace simple coefficient cost estimation with CABAC. Substantial improvement. Approximation proved to be too inaccurate while not giving actually that much speedup.	2017-12-10 01:23:48 +02:00
Miika Metsoila	ea79069dc8	Fix a type warning in encmain.c	2017-12-08 16:22:40 +02:00
Miika Metsoila	6aa4cd7528	Fix type warnings	2017-12-08 16:16:36 +02:00
Miika Metsoila	b3486b5114	Fix gcc/clang warnings and errors in cfg.c	2017-12-08 16:09:00 +02:00
Miika Metsoila	bac07457ea	Merge branch 'hevc_level'	2017-12-08 15:57:38 +02:00
Miika Metsoila	c67a24e6ec	Update readme and --help text	2017-12-07 12:32:46 +02:00
Ari Lemmetti	713e694d82	Define HAVE_STRUCT_TIMESPEC on Visual Studio 2015 and later Fixes redefinition of timespec that Pthreads-Win32 does even if it has been already defined.	2017-12-05 18:26:12 +02:00
Miika Metsoila	f64d42169f	Improve bitrate checking to accommodate non-integer and less than 1 framerates	2017-12-01 17:20:12 +02:00
Miika Metsoila	57cf92d35f	Implement level's bitrate limit checking during encoding	2017-11-28 16:19:44 +02:00
Miika Metsoila	021fb27787	Add high-tier flag	2017-11-20 16:05:28 +02:00
Miika Metsoila	d249059d61	Minor refactoring of level checking	2017-11-20 13:25:26 +02:00
Arttu Ylä-Outinen	cf85d52b9d	Kvazaar version 1.2.0	2017-11-17 15:23:33 +02:00
Miika Metsoila	4c1512e8c5	Add a check for maximum picture width and height for the given level	2017-11-15 16:39:59 +02:00
Arttu Ylä-Outinen	4cb054295a	Fix linkers Overrides the linkers used for kvazaar, libkvazaar.la and kvazaar_tests. When crypto++ is enabled, the C++ linker is used and when it is disabled, the C linker is used. This removes the need to explicitly specify -lstdc++ in configure when crypto++ is used and fixes the build with crypto++ when libstd++ is not installed.	2017-11-13 15:09:38 +02:00
Miika Metsoila	f9a4aba867	Update documentation, fix input fps default value, remove 0 as default level	2017-11-09 16:53:31 +02:00
Miika Metsoila	ebba0a4f01	Test if input conforms to it's level's limits (excluding bitrate)	2017-11-08 16:15:41 +02:00
Miika Metsoila	fb4d0c3cf2	Move level argument parsing to the correct place and give it initial values	2017-11-03 15:47:35 +02:00
Miika Metsoila	61a31054e1	Add level command-line parameter	2017-11-03 13:04:05 +02:00
Arttu Ylä-Outinen	9974380cdd	Fix bipred and temporal MVP - Fixes two errors in calculating the POC for the reference frame for temporal candidate MV scaling. - Fixes using the MV for the wrong direction when the temporal MV predictor block uses bi-prediction. Fixes #160.	2017-10-25 12:26:41 +03:00
Arttu Ylä-Outinen	841597e123	Fix picture and slice types Changes handling of intra pictures for --gop=8 so that every picture with POC divisible by the intra period is intra. The first picture is IDR and the rest of the intra pictures are CRA. POC is not reset at CRA pictures. The leading pictures that follow the CRA picture are changed to RASL so they are allowed to refer to pictures before the CRA picture. Changes inter slice types to P when the L1 reference list is empty and to B otherwise. In all-intra, all pictures are now IDR pictures with POC zero.	2017-10-20 13:35:26 +03:00
Jaakko Laitinen	957b6850c3	Change ref list printout to match hm decoded printout	2017-09-25 13:48:56 +03:00
Arttu Ylä-Outinen	20aea8df63	Fix POCs when using --gop=8 When using --gop=8 with an intra period greater than one, a single POC would be skipped before every intra frame. This commit fixes the problem by turning the intra frames into BLA frames with leading pictures when using --gop=8.	2017-09-19 09:31:58 +03:00
Miika Metsoila	6e00f63469	Remove unused variables from search_pu_inter_ref function	2017-09-18 15:36:37 +03:00
Miika Metsoila	7b0101ce3d	Merge branch 'reflist_changes' # Conflicts: # src/encoderstate.c # src/search_inter.c	2017-09-18 14:59:37 +03:00
Miika Metsoila	769b17768d	Change max function to MAX macro for clang/gcc compatibility. Remove couple of unnecessary comments	2017-09-15 14:21:51 +03:00
Miika Metsoila	5f7c5443a3	Remove inter.poc	2017-09-12 14:23:19 +03:00
Miika Metsoila	6bd78a3da7	Reverse L0 list sort direction	2017-09-12 14:23:18 +03:00
Miika Metsoila	83dc7e7f50	Made L0 to sort and fixed mv_ref_coded in search_pu_inter	2017-09-12 14:23:18 +03:00
Timothe FRIGNAC	d3362a238e	changed strtod to strtol	2017-08-31 15:14:31 +02:00
Timothe FRIGNAC	3a1ab54ff0	Fixed memory leaks	2017-08-31 11:51:41 +02:00
Timothe FRIGNAC	466297fd77	Fixed build error	2017-08-29 17:01:18 +02:00
Timothe FRIGNAC	2e130912cb	Add --key opt	2017-08-28 17:15:13 +02:00
Miika Metsoila	a5f4cf09b5	Switched from storing POCs in inter.poc to state->frame->refLXs array	2017-08-21 16:34:57 +03:00
Arttu Ylä-Outinen	409d2114f0	Fix motion vector constraints Fixes integer motion vectors being constrained more than what was necessary when using --mv-constraint or --wpp.	2017-08-11 14:41:36 +03:00
Arttu Ylä-Outinen	7144a00beb	Rewrite thread queue Changes thread queue so that only the jobs that are ready to run are stored in the queue. Other jobs are kept track of by pointers in the reverse dependency lists of other jobs. When a job is ready to run it is appended to the queue. The job queue is stored as a linked list. The definitions of threadqueue_queue_t and threadqueue_job_t are moved to the .c file, turning them into opaque structs. Makes thread queue code simpler. Fixes some TSan errors.	2017-08-11 14:18:12 +03:00
Arttu Ylä-Outinen	bc47fe94af	Drop thread queue debug code	2017-08-11 14:18:12 +03:00
Eemeli Kallio	e5cbc7a205	--sao now enables full sao	2017-08-11 13:26:55 +03:00
Eemeli Kallio	4c3453d26f	Fixed issue with no-sao argument	2017-08-11 13:12:22 +03:00
Eemeli Kallio	8674c0f5ee	Added paremeter for band and edge sao.	2017-08-11 11:57:09 +03:00
Eemeli Kallio	d9b93ea368	Added possibility to skip edge or band sao.	2017-08-11 11:51:49 +03:00
Arttu Ylä-Outinen	4b73bdd9aa	Skip checked motion vectors in early termination Changes the second iteration of early termination to skip the motion vectors that were already checked in the first iteration.	2017-08-09 14:29:09 +03:00
Arttu Ylä-Outinen	606d441362	Skip computing MV cost twice in hexagon search Changes the first step of hexagon search to skip the zero offset since the cost of the motion vector has already been computed.	2017-08-09 14:29:09 +03:00
Arttu Ylä-Outinen	fa4648061d	Add mv, cost and bitcost to inter_search_info_t	2017-08-09 14:29:08 +03:00
Arttu Ylä-Outinen	328f051d7f	Put inter search parameters in a single struct Adds struct inter_search_info_t for holding the parameters that are used by most function related to inter search. Passing the parameters in a single struct greatly reduces the number of parameters for many functions.	2017-08-09 14:27:53 +03:00
Miika Metsoila	0dd069f8af	Fixed using wrong POC in add_temporal_candidate	2017-08-09 13:50:21 +03:00
Miika Metsoila	25e0a954c7	Fixed 2 bugs causing incorrect video output	2017-08-09 13:50:21 +03:00
Arttu Ylä-Outinen	24ecddd2a5	Fix wrong strides in SAO reconstruction Functions kvz_sao_reconstruct and encoder_sao_reconstruct used frame->width as the stride instead of frame->rec->stride when accessing frame->rec->data. This caused errors when using tiles and SAO.	2017-08-01 15:40:49 +03:00
Arttu Ylä-Outinen	f0bf959d17	Fix alignment errors in 32-bit build with MSVC Changes the work_tree parameter in search.c functions from an array to a pointer. Fixes "formal parameter with requested alignment of 8 won't be aligned" errors.	2017-07-28 09:27:02 +03:00
Arttu Ylä-Outinen	9694bd2fae	Fix build on 32-bit systems Function coeff_abs_sum_avx2 that was added in `e950c9b` was outside the AVX2 #if directive.	2017-07-28 09:19:29 +03:00
Arttu Ylä-Outinen	ecb0275cdd	Store CU arrays as pointers to the main array Changes field state->tile->frame->cu_array->data to point to the CU array in the main encoder state. Removes the need to copy the CU array to the main CU array after search.	2017-07-28 08:36:45 +03:00
Arttu Ylä-Outinen	e950c9b101	Add AVX2 implementation for coefficient sum	2017-07-28 07:39:36 +03:00
Arttu Ylä-Outinen	d50ae6990c	Add sum of absolute coefficients to strategies	2017-07-28 07:39:15 +03:00
Arttu Ylä-Outinen	59faca0646	Skip CABAC coefficient cost for --rd=0	2017-07-28 07:33:03 +03:00
Arttu Ylä-Outinen	19e051ea40	Reduce intra threshold Reduces intra threshold for --rd=0 from 20 to 8. Threshold of 20 increased BD-Rate too much.	2017-07-25 13:26:38 +03:00
Arttu Ylä-Outinen	e9cf15465e	Fix inter cost in bipred The cost of coding MV ref indices and MV direction was added to bitcost but not inter cost. Fixed by adding the extra bits to inter as well.	2017-07-24 15:24:04 +03:00
Arttu Ylä-Outinen	edbe00763e	Drop extra parameter in kvz_image_calc_sad Drops the parameter max_lcu_below which was always set to -1.	2017-07-24 15:21:19 +03:00
Arttu Ylä-Outinen	ffac29061f	Fix extrapolated inter SATD	2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen	631ef53d2a	Fix inter cost calculations Inter costs are computed using SAD except when fractional motion estimation or bi-prediction is enabled. This commit changes search_pu_inter_ref to recalculate the cost with SATD. Fixes inter/intra cost comparisons since intra costs are always SATD costs.	2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen	6ce2fb1238	Add pixel offsets to encoder_state_config_tile_t Adds fields offset_x and offset_y to encoder_state_config_tile_t.	2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen	2380ba0d41	Reduce copying in kvz_get_coeff_cost Changes function kvz_get_coeff_cost to only copy the CABAC contexts and not the whole encoder state. Other threads could be simultaneously using the other parts of the encoder state. Only copying the CABAC fixes a TSan data race warning.	2017-07-24 12:38:41 +03:00
Arttu Ylä-Outinen	24b462f801	Align coefficients to 8 bytes Adds alignment attribute to lcu_coeff_t. The coefficients are sometimes handled as 64-bit integers containing four coefficients so the arrays should be aligned to 8 bytes. Fixes a UBSan error about misaligned reads.	2017-07-24 12:37:37 +03:00
Arttu Ylä-Outinen	5ddb43c6fe	Fix undefined left shifts in rdo Replaces left shifts by multiplications when the operand may be a negative value. Left shift of a negative value is undefined behavior.	2017-07-24 12:35:10 +03:00
Arttu Ylä-Outinen	d1e64ad62b	Fix undefined left shifts Replaces left shifts by multiplications when the operand may be a negative value. Left shift of a negative value is undefined behavior.	2017-07-20 11:15:30 +03:00
Arttu Ylä-Outinen	07b5fb9caf	Fix out-of-bounds read in encoderstate When calling encoder_state_encode_leaf with POC 0, index -1 of the GOP array would be accessed. Fixed by skipping the code for I-frames.	2017-07-20 11:15:30 +03:00
Arttu Ylä-Outinen	8c4a3473a8	Change --owf=auto and --threads=auto selection Changes OWF selection so that it is chosen based on the maximum number of parallel CTUs. Number of threads is limited to prevent overhead from extra threads.	2017-07-20 09:42:28 +03:00
Arttu Ylä-Outinen	4fc9b743c1	Drop an unnecessary pthread_cond_broadcast Drop pthread_cond_broadcast on threadqueue->cond in function kvz_threadqueue_waitfor. The broadcast caused threads to be woken up more often than necessary.	2017-07-19 11:09:30 +03:00
Arttu Ylä-Outinen	14003c6a30	Disable printing PSNR with --no-psnr	2017-07-19 10:38:37 +03:00
Arttu Ylä-Outinen	e90bde5c62	Clarify PSNR output Adds letters Y, U and V to the PSNR output to make it clearer that the printed values are the luma and chroma PSNR.	2017-07-19 10:33:43 +03:00
Arttu Ylä-Outinen	fdb3480b54	Enable strategies for SAO reconstruction Re-enables strategies for SAO reconstruction. They were disabled in commit `ec9ff42`.	2017-07-11 10:35:18 +03:00
Arttu Ylä-Outinen	333dba3884	Add static to SAO strategies	2017-07-11 10:02:01 +03:00
Miika Metsoila	e8cc2d8f6a	Small fixes	2017-07-07 13:58:19 +03:00
Arttu Ylä-Outinen	67a60a35e3	Fix invalid calls to normalize_lcu_weights Changes encoder_state_init_new_frame to only call normalize_lcu_weights when the weights have been written to the array and rate control is enabled. When rate control is disabled, the weights are not used.	2017-07-07 11:05:31 +03:00
Arttu Ylä-Outinen	563bc26e71	Fix out-of-bounds read in AVX2 SAO AVX2 version of SAO loaded offsets with a 256 bit read even though there are only five 32 bit integers.	2017-07-06 13:04:52 +03:00
Arttu Ylä-Outinen	0850b17f96	Drop get_wpp_limit in search_inter WPP limit for motion vectors is now computed inside fracmv_within_tile.	2017-07-05 13:22:53 +03:00
Arttu Ylä-Outinen	2a85f0f5a4	Move hard-coded MV limits to encoder_control_t Adds field max_inter_ref_lcu to encoder_control_t. It is used to set up inter-LCU dependencies in encoder_state_encode_leaf and restrict motion vectors in fracmv_within_tile.	2017-07-05 13:22:53 +03:00
Arttu Ylä-Outinen	bb5354f7e2	Relax inter-CTU dependencies when SAO is off When using WPP and OWF, the first CTU of a row depends on the last CTU of the row below in the reference frame. This is necessary when SAO is enabled since we currently do SAO for a whole CTU row at a time. When SAO is disabled, however, it is unnecessary to wait for the whole row. Changes CTUs to depend only on the CTU below in the reference frame instead of the whole row when WPP and OWF are enabled and SAO disabled. Gives a significant speedup when running on a machine with many CPU cores.	2017-07-05 13:21:06 +03:00
Arttu Ylä-Outinen	1efa2708b2	Do SAO reconstruction for a single CTU at a time Moves SAO reconstruction into encoder_state_worker_encode_lcu instead of doing it in a separate step for the whole CTU row. Reconstruction of the rightmost 10 pixels and bottommost 10 pixels of a CTU is delayed until the neighboring CTU has been deblocked. Doing SAO for the whole CTU row at a time caused unnecessary inter-CTU dependencies when using WPP and OWF. The first CTU of a row would need to wait until SAO was done for the row below in the previous frame. Moving SAO reconstruction to immediately after deblocking each CTU fixes this problem.	2017-07-04 15:14:31 +03:00
Arttu Ylä-Outinen	ec9ff42077	Rewrite SAO recon to handle arbitrary sized blocks Adds width and height parameters to function kvz_sao_reconstruct and changes it to take coordinates in units of pixels. This will be useful for doing SAO for areas smaller than a whole CTU.	2017-06-30 16:09:18 +03:00
Miika Metsoila	dcd7acf4fd	Fixed crash and incorrect info output	2017-06-27 16:05:15 +03:00
Miika Metsoila	f8b6234fdb	Changes to refence lists to behave more like L0/L1 lists from the specification	2017-06-27 16:05:15 +03:00
Arttu Ylä-Outinen	2c66e0bbd2	Fix warnings about invalid reads in AVX2 ipol AVX2 filter functions read pixels in chunks of 8 or 16 bytes. At the end of the block, the read goes out of the bounds of the pixels array. The extra pixels do not affect the result. Fixes valgrind complaining about the invalid reads by allocating 5 extra pixels in kvz_get_extended_block_avx2	2017-06-22 09:37:55 +03:00
Arttu Ylä-Outinen	4d20e156db	Fix handling intra period not multiple of GOP length With low delay GOP structure, it is possible to use an intra period that is not a multiple of the GOP structure length. Commit `00c9f52` changed encoder_state_init_new_frame to reset POC on intra frames. GOP offset, however, was not reset, resulting in invalid POCs and references for the following frames. This commit changes function kvz_encoder_feed_frame so that GOP offset is correctly reset on intra frames.	2017-06-22 09:29:00 +03:00
Arttu Ylä-Outinen	00c9f52bd4	Fix setting picture type when using GOP Changes encoder_state_init_new_frame to set intra frame pictype to KVZ_NAL_IDR_W_RADL even when using GOP.	2017-06-21 13:21:47 +03:00
Arttu Ylä-Outinen	f54a25f112	Fix crash when immediately closing encoder When closing the encoder, the pictures stored in the input frame buffer are freed by repeatedly calling kvz_encoder_feed_frame. If the encoder was closed immediately after opening it, kvz_encoder_feed_frame would be called with an unprepared encoder state. This would trigger an assert. Fixed by changing kvz_encoder_feed_frame so that it does not require the encoder state to be prepared.	2017-06-15 11:57:46 +03:00
Arttu Ylä-Outinen	b74e0458fd	Set inter transform depth to zero Sets max_transform_hierarchy_depth_inter to 0 in SPS. This saves some bits because split_transform_flag does not need to be coded for inter blocks. When SMP and AMP blocks are enabled the depth is set to 1 instead. Otherwise inter split flag would default to 1 for SMP and AMP blocks, resulting in an unnecessary transform split.	2017-06-08 10:08:20 +03:00
Arttu Ylä-Outinen	8dd01ba5a9	Refactor helper functions in search Combines functions lcu_set_intra_mode and lcu_set_inter_pu to a single function. Removes some duplicated code.	2017-06-06 10:32:09 +03:00
Arttu Ylä-Outinen	1bbecf7584	Refactor work tree copy functions Extracts common code shared by work_tree_copy_up and work_tree_copy_down to a separate function.	2017-06-06 10:32:00 +03:00
Arttu Ylä-Outinen	2b169d5d63	Fix crash in kvazaar_close Changes kvazaar_close to stop all threads before freeing encoder states. Fixes a crash when the encoder is closed before all pictures have been encoded.	2017-06-02 10:05:33 +03:00
Arttu Ylä-Outinen	eb9a05b7ef	Fix memory leak Changes kvazaar_close to free the remaining pictures in the the input frame buffer. Fixes a memory leak when the encoder is closed while there are pictures left in the buffer.	2017-06-01 15:39:35 +03:00
Arttu Ylä-Outinen	8b2483ca1c	Combine intra reconstruction functions Replaces function kvz_intra_recon_lcu_luma and kvz_intra_recon_lcu_chroma in intra.c with function kvz_intra_recon_cu. The new function can handle reconstruction for both luma and chroma. Removes some duplicated code.	2017-05-24 12:07:31 +03:00
Arttu Ylä-Outinen	e67fdb853d	Move intra leaf TB recon to a separate function Moves code for intra leaf transform block reconstruction from functions kvz_intra_recon_lcu_luma and kvz_intra_recon_lcu_chroma to a new function intra_recon_tb_leaf. Removes some duplicated code.	2017-05-24 12:07:31 +03:00
Arttu Ylä-Outinen	13d2fdbd21	Drop unused kvz_videoframe_get_cu functions	2017-05-24 11:15:31 +03:00
Arttu Ylä-Outinen	f5eef7f33c	Use luma pixel coordinates in encode_coding_tree Changes functions encode_intra_coding_unit and encode_coding_tree to take coordinate arguments in units of luma pixels instead of 8 px blocks. This should make the code easier to understand.	2017-05-24 11:15:31 +03:00
Arttu Ylä-Outinen	525a5180ff	Combine intra CU encoding functions Merges functions encode_intra_coding_unit and encode_intra_coding_unit_encry. Removes a lot of duplicated code.	2017-05-24 11:12:40 +03:00
Arttu Ylä-Outinen	610c91b0c5	Use luma pixel coordinates in TU coding functions Changes functions encode_transform_unit and encode_transform_coeff to take coordinate arguments in units of luma pixels instead of 4 px blocks. This should make the code easier to understand.	2017-05-23 15:36:16 +03:00
Arttu Ylä-Outinen	2e8838de6e	Fix crash when crypto compiled in but disabled When kvazaar was built with crypto++ but running without using encryption features, kvazaar attempted to delete an uninitialized crypto handle. Fixed by setting the handle to NULL in kvz_encoder_state_init.	2017-05-23 14:01:48 +03:00
Arttu Ylä-Outinen	2f2c281e8e	Fix a memory leak in crypto A CryptoPP::CFB_Mode<CryptoPP::AES>::Encryption was allocated at the beginning of encoder_state_encode_leaf and was never freed. This commit changes encoder_state_worker_encode_lcu to delete the CFB_Mode. Also moves crypto handle from encoder_state_config_tile_t to encoder_state_t so that it can be safely deleted without affecting other threads in the same tile.	2017-05-23 11:51:25 +03:00
Arttu Ylä-Outinen	22155950c1	Rewrite crypto to conform to kvazaar code style	2017-05-23 11:51:25 +03:00
Arttu Ylä-Outinen	6829865190	Fix inline declaration in intra_mode_encryption Moves the inline declaration of intra_mode_encryption before the type and changes it to use the INLINE macro. Inline declaration after type triggered a warning on GCC.	2017-05-23 11:50:32 +03:00
Arttu Ylä-Outinen	5f8e17d4ba	Eliminate a race condition in threadqueue Fixes the order of acquiring locks for the job and its dependency in kvz_threadqueue_job_dep_add. The dependency is locked before the job that depends on it. This is the same order as in threadqueue_worker. Acquiring the locks in different order in kvz_threadqueue_job_dep_add and threadqueue_worker would sometimes result in a deadlock.	2017-05-18 12:25:53 +03:00
Arttu Ylä-Outinen	4b213477f0	Return best MV from inter early terminate When using --me-early-termination=sensitive, early termination of inter search used to always return the starting point if no tested motion vector was good enough to continue the search. This commit changes early_termination to always return the best motion vector and cost found.	2017-05-18 09:05:14 +03:00
Arttu Ylä-Outinen	382636de55	Fix handling too large QPs Changes kvz_config_validate to output an error if the given QP is out of range and changes kvz_set_picture_lambda_and_qp to clip the QP to the valid range if is too large after applying QP offset from GOP structure.	2017-05-17 12:41:51 +03:00
Arttu Ylä-Outinen	de8b59c681	Drop unused function kvz_coefficients_blit	2017-05-12 16:48:30 +03:00
Arttu Ylä-Outinen	bcfa5a3cd9	Add a comment explaining the coefficient order	2017-05-12 16:46:57 +03:00
Arttu Ylä-Outinen	95775a1645	Change coefficient storage order Changes coefficient storage order to a zig-zag order. Reduces unnecessary copying of coefficients to temporary arrays.	2017-05-12 16:46:57 +03:00
Arttu Ylä-Outinen	9395867a9a	Quantize all colors in a single traversal Changes kvz_quantize_lcu_residual to process all three colors in a single traversal of the TU tree.	2017-05-12 16:42:41 +03:00
Arttu Ylä-Outinen	1e58fd6b16	Split kvz_quantize_lcu_residual Splits kvz_quantize_lcu_residual to two functions that handle the TU tree recursion and quantization of a single TU.	2017-05-12 16:42:41 +03:00
Arttu Ylä-Outinen	cc87e0dcc7	Combine luma and chroma quantization functions Replaces functions kvz_quantize_lcu_luma_residual and kvz_quantize_lcu_chroma_residual in transform.c with function kvz_quantize_lcu_residual. The new function can handle any of the YUV colors. Removes some duplicated code.	2017-05-12 16:42:41 +03:00
Arttu Ylä-Outinen	1357dd0599	Pass coeffs through encoder state Changes the way coefficients are passed from kvz_search_lcu to kvz_encode_coding_tree. Drops fields coeff_y, coeff_u and coeff_v in videoframe_t and instead passes them through field coeff in endoder_state_t.	2017-05-12 16:42:41 +03:00
Eemeli Kallio	2cad3173ec	Reduced amount of modes for search_intra_rdo	2017-05-12 15:56:07 +03:00
Arttu Ylä-Outinen	26adef4492	Merge branch 'erp-aqp'	2017-05-12 15:05:24 +03:00
Eemeli Kallio	55e0e65733	Added INLINE to kvz_get_ic_rate and kvz_get_coded_level in rdo.c	2017-05-12 15:03:30 +03:00
Arttu Ylä-Outinen	ee3d4d0e78	Add adaptive QP for 360 degree video Adds option --erp-aqp for enabling adaptive QP for 360 degree video with equirectangular projection. When projected into a spherical surface, the middle part of the video covers relatively larger area than the top and bottom parts. Enabling --erp-aqp sets up a ROI delta QP array which uses higher QPs for the top and bottom of the video and lower QPs for the middle part.	2017-05-11 12:31:53 +03:00
Arttu Ylä-Outinen	79cb3a2fd3	Permit negative QP deltas in ROI Delta QPs should not be arbitrarily restricted to positive values.	2017-05-11 12:13:47 +03:00
Arttu Ylä-Outinen	edfbd6f122	Add field lcu_dqp_enabled to encoder_control_t Delta QPs for LCUs are enabled when either ROI coding or rate control is enabled. Having a single field is simpler than always checking whether ROI or rate control is enabled.	2017-05-11 12:13:47 +03:00
Arttu Ylä-Outinen	2f2405dfe6	Fix crash when PU depth is limited When video width or height was not a multiple of the smallest CU size, no prediction would be performed at the border CUs. Kvazaar would later crash at an assertion failure when attempting to write the bitstream for the CU. Fixed by permitting inter and intra prediction when the CU split is forced, even if CUs of that size would otherwise be disabled.	2017-04-27 10:35:48 +03:00
Arttu Ylä-Outinen	9130b5107c	Change handling of infinite PSNR in encmain Changes encmain to print 999.99 as PSNR when SSE is zero. This behavior is in line with HM. Previously SSE was set to 99 when it was zero.	2017-04-27 10:35:13 +03:00

... 6 7 8 9 10 ...

2812 commits