hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 10:34:05 +00:00

Author	SHA1	Message	Date
Joose Sainio	977e885ea2	Fix issue with gop=0 introduced in `1c36f68d0c`	2019-07-05 12:57:27 +03:00
Marko Viitanen	c6217e236f	Enable 4-tap filtering for the intra angular	2019-07-04 16:26:10 +03:00
Marko Viitanen	cda6d951c0	Change DCT arrays back to 8-bit -> some frames are now correct	2019-07-04 15:59:10 +03:00
Marko Viitanen	8280bd3217	Add channel info to angular_pred and fix the displacement tables. Also includes 4-tap intra filtering code commented out	2019-07-04 09:35:47 +03:00
Marko Viitanen	5e4369d6b0	Fix the kvz_cabac_encode_aligned_bins_ep function -> cabac coding now correct	2019-07-03 15:55:52 +03:00
Marko Viitanen	3fad4b0a98	Disable kvz_cabac_encode_aligned_bins_ep for now and add a ToDo message	2019-07-03 15:44:35 +03:00
Sami Ahovainio	ce1e67cc3a	Modified header flags to match VTM commit b9080ff45bec368c44f0c43a32dcd6804ef9f5d6	2019-07-01 13:58:15 +03:00
Sami Ahovainio	3863064d90	Fixed bugs in split decision and coefficient coding.	2019-07-01 13:00:43 +03:00
Mikko Pitkänen	a7f09c8114	Merge branch 'threadwrapper'	2019-06-24 16:54:59 +03:00
Sami Ahovainio	db5c0230e5	Fixed coefficient sign hiding	2019-06-20 16:26:01 +03:00
Sami Ahovainio	b51254cafd	Fixed significant coefficient group context calculation	2019-06-20 15:47:13 +03:00
Sami Ahovainio	5e0bea962c	Fixed split context decision	2019-06-20 15:30:49 +03:00
Sami Ahovainio	12322144f0	Removed debug print from context.c	2019-06-20 15:18:22 +03:00
Sami Ahovainio	3a9800d07d	Fixed coefficient coding. Fixed headers to match VTM commit e65075531471a68632bc9252d607655a0feeabc6	2019-06-20 14:43:03 +03:00
Mikko Pitkänen	3dd606ce2e	Add new threadwrapper	2019-06-18 18:45:45 +03:00
Sami Ahovainio	2c78aa0642	Fixes to coeff coding.	2019-06-13 12:01:29 +03:00
Joose Sainio	c94077d15e	remove hardcoded value	2019-06-12 14:37:41 +03:00
Joose Sainio	ac68c8444d	remove negation that wasn't supposed to be there	2019-06-12 14:35:24 +03:00
Joose Sainio	5851dcc3be	missing negation	2019-06-12 14:08:18 +03:00
Joose Sainio	1c36f68d0c	Fix owf>=9 gop=8 and add test to catch such problem in future	2019-06-12 14:04:41 +03:00
Sami Ahovainio	3564b4829e	Fixed split context decision. Modified intra mode initialization to match VTM version aa76fc5c04cf43390f43d63f9977bea8ee31997a.	2019-06-12 12:59:16 +03:00
Sami Ahovainio	a8a53e15b5	Fixed headers to match VTM commit aa76fc5c04cf43390f43d63f9977bea8ee31997a. Added multi_ref_line flag coding.	2019-06-07 13:37:45 +03:00
Ari Lemmetti	933ff6ed55	Merge branch 'set-qp-in-cu-fix'	2019-06-07 09:01:03 +03:00
Sami Ahovainio	8d2581e58c	Fixed issue with kvz_go_rice_par_abs where passing a unsigned argument caused MIN function to return wrong value. Modified coefficient coding to match VTM 5.0. Some issues still remain.	2019-06-05 15:57:18 +03:00
Sami Ahovainio	367f1b2129	Fixed splitting bug caused by wrong values in the headers. Fixed header flags to match VTM commit 5703e81b2de677d976ec15423f5768b17619ba6a	2019-06-05 11:21:02 +03:00
Sami Ahovainio	76d56290ed	Fixed VUI header writing. Fixed debug prints of NAL headers and rbsp_stop_one_bit.	2019-05-31 11:13:11 +03:00
Ari Lemmetti	c6da839002	Set lcu sqrt lambda according to lcu lambda instead of frame lambda when ROI is used	2019-05-29 18:32:10 +03:00
Marko Viitanen	8282a18c36	Fixed headers and NAL writing to match the latest VTM master 988c22cbb9c58584cac3ef0ec7794cafbea6dfd6	2019-05-29 16:18:35 +03:00
Sami Ahovainio	4768ba0628	Minor fixes to header writing. Added contexts for multi_ref_line and BDPCM. Functions added for writing both in bitstream, but they are both disabled for now.	2019-05-29 13:00:19 +03:00
Sami Ahovainio	3339e12169	Fixed some header flags	2019-05-27 09:56:56 +03:00
Ari Lemmetti	9339845e8b	Set QP completely at CU level as the name '--set-qp-in-cu' implies -Move slice delta QP to CU level when using --set-qp-in-cu -Separate functionality from roi	2019-05-24 20:38:39 +03:00
Pauli Oikkonen	081d16fc33	Fix intrinsics that may be missing on some systems Create a header to collect all the workarounds for missing intrinsics in one place	2019-05-23 19:59:40 +03:00
Sami Ahovainio	5b46fbd878	Added multi_ref_idx variable for intra coding (is 0 throughout the code for now). Modified prediction flag writing. Chroma pred flag remains unchanged (ToDo). Added bitstream debug printing on VERBOSE mode.	2019-05-21 12:28:05 +03:00
Sami Ahovainio	ed4e218702	Updated coefficient coding to match VTM 5.0	2019-05-13 15:30:43 +03:00
Sami Ahovainio	504c3dfd1b	Modified the headers to match current VTM headers	2019-05-07 16:30:06 +03:00
Marko Viitanen	30a8a7b97c	WIP fixing the last significant xy coding	2019-05-07 15:01:02 +03:00
Pauli Oikkonen	87a9208db8	Eliminate cvtsi64_si128 intrinsic Apparently it'll cause Win32 builds to break because it emits the movq instruction or something..	2019-04-17 16:30:40 +03:00
Pauli Oikkonen	7175d20bb2	Still include stdint.h for non-vector builds	2019-04-15 19:36:01 +03:00
Pauli Oikkonen	1315c7e2b0	Do not compile any vector code for non-SSE4/AVX2 builds	2019-04-15 19:10:48 +03:00
Pauli Oikkonen	f5f70e7bc5	Merge branch 'sad-optimization'	2019-04-15 19:02:01 +03:00
Jan Beich	85f46e17a9	Detect AltiVec via elf_aux_info() on FreeBSD 12+	2019-04-01 13:08:04 +00:00
Jan Beich	82486255da	Simplify AltiVec detection on Linux	2019-04-01 13:08:04 +00:00
Marko Viitanen	1546acfdb9	New NAL unit IDs and header changes	2019-03-28 10:11:36 +02:00
Marko Viitanen	36eab9c170	New cabac context models with "rate"	2019-03-27 12:38:19 +02:00
Marko Viitanen	3bdc8ac8d3	Fix intra_chroma_pred_mode and cbf contexts	2019-03-26 09:10:09 +02:00
Marko Viitanen	d15f58517f	Changed intra coding to use 6 MPM, implemented merge sort and MPM selection	2019-03-20 15:20:31 +02:00
Marko Viitanen	1081336868	Updated intra pred mode init values	2019-03-20 15:18:32 +02:00
Marko Viitanen	f3acd245ae	New cabac coding function: kvz_cabac_encode_trunc_bin	2019-03-20 15:17:54 +02:00
Marko Viitanen	80d6e4bf05	New split flag calculations	2019-03-20 09:07:58 +02:00
Marko Viitanen	8c84348010	New entropy bit table	2019-03-20 09:07:22 +02:00
Marko Viitanen	2d0348aa6d	New context models	2019-03-20 09:06:57 +02:00
Marko Viitanen	052080747e	New CABAC functions	2019-03-20 09:06:26 +02:00
Marko Viitanen	20667fdba6	Update header bits to VTM 4.0+	2019-03-11 14:02:12 +02:00
Pauli Oikkonen	6d43759604	Create a border-respecting 32-wide AVX hor_sad	2019-03-07 18:01:22 +02:00
Pauli Oikkonen	f218cecb38	Remove offending hor_sad_avx2_w32 function Consider possibly creating a non-offending AVX2 version instead, the way hor_sad_sse41_w32 works. Or maybe there's more essential work to do.	2019-03-05 22:51:41 +02:00
Pauli Oikkonen	df2e6c54fd	4-unroll hor_sad_sse41_arbitrary This may not increase perf though because it's so rarely used function, so keeping icache footprint may be more essential...	2019-03-05 22:45:23 +02:00
Pauli Oikkonen	448eacba7b	Avoid overreading block borders in hor_sad_sse41_arbitrary	2019-03-05 22:34:50 +02:00
Eemeli Kallio	c159e275b7	Merge branch 'max_merge'	2019-03-05 14:39:03 +02:00
Pauli Oikkonen	41f51c08c4	Avoid overrunning buffer in hor_sad_sse41_w32	2019-03-01 15:37:38 +02:00
Pauli Oikkonen	bcd9879359	Include quant coeff range check in non-scaling list execution path too	2019-02-27 17:26:44 +02:00
Pauli Oikkonen	24e6363f64	Remove the kvz_quant_avx2 wrapper function	2019-02-27 16:32:58 +02:00
Pauli Oikkonen	748820f3c5	Eliminate unnecessary loading of coeffs if scaling lists are off	2019-02-27 16:26:35 +02:00
Pauli Oikkonen	5994350f40	Allow quant_flat_avx2 to be used with scaling lists on	2019-02-27 16:25:59 +02:00
Eemeli Kallio	7f4e0acf41	Added check if max-merge is out of bounds	2019-02-19 13:53:42 +02:00
Pauli Oikkonen	9b0e079262	Use SSE instructions for 64-bit SADs instead of MMX VC++ seems to choke on MMX instructions	2019-02-18 20:13:33 +02:00
Pauli Oikkonen	d8b8923028	Add LGPL notices to reg_sad headers	2019-02-18 17:52:47 +02:00
Eemeli Kallio	2a40560888	some variables to const	2019-02-12 11:24:10 +02:00
Eemeli Kallio	8f8e7bb53c	Added possibility to reduce number of maximum number of merge candidates.	2019-02-12 09:21:03 +02:00
Marko Viitanen	1165219842	Update PTL, SPS ext and SPS flags to match VTM 4rc1	2019-02-07 10:00:04 +02:00
Pauli Oikkonen	770db825b9	Create hor_sad_w8 and w4 epol mask the way w16 works	2019-02-06 19:34:26 +02:00
Pauli Oikkonen	aa19bcac8a	Avoid branching in creating shuffle mask in hor_sad_w16	2019-02-06 18:58:46 +02:00
Pauli Oikkonen	2d05ca8520	Remove width from constant-width hor_sad func params They should kinda know it already	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	57db234d95	Move 32-wide SSE4.1 hor_sad to picture-sse41.c It's not used by picture-avx2.c that also includes the header, so it should not be in the header	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	dd7d989a39	Implement 32-wide hor_sad on AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ff70c8a5ec	Utilize horizontal SAD functions for SSE4.1 as well	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f5ff4db01f	4-wide hor_sad border agnostic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	35e7f9a700	Fix hor_sad w8 to work with both borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	836783dd6e	Use hor_sad_w32 for both left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	69687c8d24	Modify hor_sad_sse41_w16 to work over left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	51c2abe99a	Modify image_interpolated_sad to use kvz_hor_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	1e0eb1af30	Add generic strategy for hor_sad'ing an non-split width block	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	686fb2c957	Unroll arbitrary-width SSE4.1 hor_sad by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	768203a2de	First version of arbitrary-width SSE4.1 hor_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ccf683b9b6	Start work on left and right border aware hor_sad Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point investigate if this can start to thrash icache	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	760bd0397d	Pad the image buffer by 64 bytes from both ends This will be necessary for an efficient and straightforward implementation of hor_sad for blocks over 16 pixels wide, because they cannot use the shuffle trick because inter-lane shuffling is so hard to do	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	c36482a11a	Fix bug in 24-wide SAD facepalm	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f781dc31f0	Create strategy for ver_sad Easy to vectorize	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ca94ae9529	Handle extrapolated blocks with unmodified width using optimized_sad pointer	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91b30c7064	Tidy up kvz_image_calc_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	9db0a1bcda	Create get_optimized_sad func for SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91380729b1	Add generic get_optimized_sad implementation NOTE: To force generic SAD implementation on devices supporting vectorized variants, you now have to override both get_optimized_sad and reg_sad to generic (only overriding get_optimized_sad on AVX2 hardware would just run all SAD blocks through reg_sad_avx2). Let's see if there's a more sensible way to do it, but it's not trivial.	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	45f36645a6	Move choosing of tailored SAD function higher up the calling chain	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91cb0fbd45	Create strategy for directly obtaining pointer to constant-width SAD function	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	94035be342	Unify unrolling naming conventions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	517a4338f6	Unroll SSE SAD for 8-wide blocks to process 4 lines at once	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	0f665b28f6	Unroll arbitrary width SSE4.1 SAD by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	cbca3347b5	Unroll 64-wide AVX2 SAD by 2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	84cf771dea	Unroll 32 and 16 wide SAD vector implementations by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	5df5c5f8a4	Cast all pointers to const types in vector SAD funcs Also tidy up the pointer arithmetic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a711ce3df5	Inline fixed width vectorized SAD functions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	6504145cce	Remove 16-pixel wide AVX2 SAD implementation At least on Skylake, it's noticeably slower than the very simple version using SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4cb371184b	Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	796568d9cc	Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4d45d828fa	Use constant-width SSE4.1 SAD funcs for AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	2eaa7bc9d2	Move SSE4.1 SAD functions to separate header	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	d2db0086e1	Create constant width SAD versions for 8 and 16 pixels	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a13fc51003	Include a blank AVX2 strategy registration function even in non-AVX2 builds	2019-02-04 19:52:24 +02:00
Pauli Oikkonen	d55414db66	Only build AVX2 coeff encoding when supported ..whoops	2019-02-04 19:34:30 +02:00
Pauli Oikkonen	3fe2f29456	Merge branch 'encode-coeffs-avx2'	2019-02-04 18:52:31 +02:00
Pauli Oikkonen	722b738888	Fix more naming issues	2019-02-04 16:05:43 +02:00
Pauli Oikkonen	e26d98fb75	Rename a couple variables and add crucial comments	2019-02-04 15:57:07 +02:00
Pauli Oikkonen	f186455619	Move encode_last_significant_xy out of strategy modules It's the exact same in both AVX2 and generic, and does not seem to be worth even trying to vectorize	2019-02-04 14:55:41 +02:00
Pauli Oikkonen	3f7340c932	Fine-tune pack_16x16b_to_16x2b Avoid mm_set1 operation when it's possible to create the constant with one bit-shift operation from another instead. Thanks Intel for 3-operand instruction encoding!	2019-02-04 14:44:47 +02:00
Pauli Oikkonen	314f5b0e1f	Rename 16x2b cmpgt function, comment it better, optimize it slightly Eliminate an unnecessary bit masking to make it even more messy	2019-02-04 14:44:32 +02:00
Pauli Oikkonen	d8ff6a6459	Fix _andn_u32 to work on old Visual Studio	2019-02-01 15:34:42 +02:00
Pauli Oikkonen	26e1b2c783	Use (u)int32_t instead of (unsigned) int in reg_sad_sse41	2019-01-10 14:37:04 +02:00
Pauli Oikkonen	3a1f2eb752	Prefer SSE4.1 implementation of SAD over AVX2 It seems that the 128-bit wide version consistently outperforms the 256-bit one	2019-01-10 13:48:55 +02:00
Pauli Oikkonen	9b24d81c6a	Use SSE instead of AVX for small widths Highly dubious if this will help performance at all	2019-01-07 20:12:13 +02:00
Pauli Oikkonen	b2176bf72a	Optimize SSE4.1 version of SAD Make it use the same vblend trick as AVX2. Interestingly, on my test setup this seems to be faster than the same code using 256-bit AVX vectors.	2019-01-07 19:40:57 +02:00
Pauli Oikkonen	887d7700a8	Modify AVX2 SAD to mask data by byte granularity in AVX registers Avoids using any SAD calculations narrower than 256 bits, and simplifies the code. Also improves execution speed	2019-01-07 18:53:15 +02:00
Pauli Oikkonen	7585f79a71	AVX2-ize SAD calculation Performance is no better than SSE though	2019-01-07 16:26:24 +02:00
Pauli Oikkonen	ab3dc58df6	Copy SAD SSE4.1 impl to AVX2	2019-01-03 18:31:57 +02:00
Pauli Oikkonen	45ac6e6d03	Tidy pack_16x16b_to_16x2b comments	2019-01-03 16:37:05 +02:00
Ari Lemmetti	cd818db724	Add missing quantization and residual in cost calculation (inter rd=2).	2018-12-21 15:55:29 +02:00
Pauli Oikkonen	016eb014ad	Move packing 16x16b -> 16x2b into separate function	2018-12-20 10:51:44 +02:00
Ari Lemmetti	b234897e8a	Fix smp and amp blocks in fme and revert previous change. Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc. Calculate SATD on the 8x4, ... part	2018-12-19 21:30:53 +02:00
Pauli Oikkonen	9aaa6f260d	Fixes to enable portability	2018-12-18 20:42:09 +02:00
Pauli Oikkonen	2fdbbe9730	Move CG reordering code from quant-avx2 to shared header	2018-12-18 19:42:18 +02:00
Pauli Oikkonen	d02207306d	Create a header file for shared AVX2 code	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	361bf0c7db	Precompute >=2 coeff encoding loop with 2-bit arithmetic Who needs 16x16b vectors when you can do practically the same with 16x2b pseudovectors in 32-bit general purpose registers!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	940b0e9e6a	Require BMI2 for AVX2 build Any processor implementing AVX2 should also implement BMI2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	f66cb23d5b	Optimize greater1 encoding loop Calculating the c1 variable need not be a serial operation!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	8c8b791c35	Vectorize kvz_context_get_sig_ctx_inc	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	033261eb74	Eliminate two branches using bit magic	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c4434e8d04	Scan CG's in forward order to simplify finding last significant	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	efd097f5a5	Vectorize the coeff group loop to some extent	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	a01362e638	use the efficient method of reordering raster->scan	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	50a888e789	Use the efficient method to find first and last nz coeffs in block	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	7e9203f566	Scan coeff groups in scan order to help find last significant one	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	9a5a6fdbc7	Simplify two ifs in encode_coeff_nxn-avx2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	37a2a8bac8	See if loop can be optimized by rearranging	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	584f2f74b6	Vectorize significant coeff group scanning loop	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	1bfed73221	Add AVX2 strategy for encode_coding_tree	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c3a6f3112a	Add generic strategy group for encode_coding_tree	2018-12-18 19:41:09 +02:00
Marko Viitanen	1ef851ab4b	Disable FME on amp/smp blocks with width or height not divisible by 8	2018-12-18 10:28:21 +02:00
Joose Sainio	b71c5573f0	Merge branch 'rate_control_fix'	2018-12-17 12:39:27 +02:00
Sergei Trofimovich	68a70e45a1	x86 asm: mark stack as non-executable Gentoo's `scanelf` QA tool detects writable/executable stack of assembly-writtent files as: ``` $ scanelf -qRa . 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-satd.o ``` Normally C compiler emits non-executable stack marking (or GNU assembler via `-Wa,--noexecstack`). The change adds non-executable stack marking for yasm-based assmbly files. https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>	2018-12-16 11:31:56 +00:00
Reima Hyvönen	1fcc5c6a8d	Merge branch 'bipred_recon'	2018-12-11 09:59:35 +02:00
Reima Hyvönen	e4a10880f3	Added case 12 to bipred_recon no mov	2018-12-11 09:52:17 +02:00
Marko Viitanen	a4f3968e52	Fix Visual Studio errors by initializing some variables used in AVX2 signhiding	2018-12-11 09:33:26 +02:00

1 2 3 4 5 ...

2629 commits