hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-12-03 21:44:06 +00:00

Author	SHA1	Message	Date
Pauli Oikkonen	997fd369b3	Redo calc_sao_edge_dir_avx2 Do it wider, 32 pixels at once!	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	db1e475e02	Use i32 instead of i8 for x/y offsets Doesn't matter too much, because this number isn't used in SIMD computation, only as a memory reference offset.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	12de466ef5	Reimplement non-band SAO color reconstruction in AVX2 Streamline things to work on 32 pixels at once instead of 8	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	e8bff99329	Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction Vectorize it all, hope this helps with perf	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	7b5dffa855	Implement calc_sao_offset_array in AVX2 To be efficient, the AVX2 color reconstruction algorithm will need offsets in byte, not dword, arrays. This is completely specific to 8-bit pixels and the function signature is fundamentally distinct from the generic algorithm, so it's better to not strategize SAO offset array calculation.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	08881f5e9b	(TEMP) (TODO) (whatever) Avoid compiler warnings I want the CI to not crash on its -Wall -Werror, but instead to actually build the thing and report me about actual memory errors etc	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	c18adc5ee0	Redo sao_band_ddistortion_avx2 Avoid branching and do the entire thing on 32 pixels at once in YMMs. Also make the sao_bands function parameter const.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	1bb9a079a8	Fix indentation	2019-08-07 16:35:24 +03:00
Reima Hyvönen	7bc959c7c5	3 sao functions are now working	2019-08-07 16:35:24 +03:00
Reima Hyvönen	0e0f2d3490	made to clear sum vector after it has been set to memory	2019-08-07 16:35:24 +03:00
Reima Hyvönen	f146de7acb	removed some variables to prevent memory losses	2019-08-07 16:35:24 +03:00
Reima Hyvönen	247c3a7a71	conversed gined to unsigned int	2019-08-07 16:35:24 +03:00
Reima Hyvönen	ac5c216974	Some more memory error preventing to sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	3fb1cbca35	more editing sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	afbb6fb960	some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures	2019-08-07 16:35:24 +03:00
Reima Hyvönen	3496a57f7a	Edited sao_edge_ddistortion_avx2 to avoid memory overflow	2019-08-07 16:35:24 +03:00
Reima Hyvönen	267ba1d6ce	Modified sao_band_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	e70663b245	added some sub commands to avoid memory read errors	2019-08-07 16:35:24 +03:00
Reima Hyvönen	59dfb4570c	Converted some loads to load int8_t instead ints	2019-08-07 16:35:24 +03:00
Reima Hyvönen	8b253209a8	Found false address load from calc_sao_edge_dir. Should now work like generic	2019-08-07 16:35:24 +03:00
Reima Hyvönen	50e0a47b7a	Took away __restrict	2019-08-07 16:35:24 +03:00
Reima Hyvönen	8a39eb674e	Removed c-variable from calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	bc0a36830d	Clerified some 6 pixel loads	2019-08-07 16:35:24 +03:00
Reima Hyvönen	1a8b211e05	Added break to line 170	2019-08-07 16:35:24 +03:00
Reima Hyvönen	d05e750ebe	Added some switches to prevent segmentation fault from reading	2019-08-07 16:35:24 +03:00
Reima Hyvönen	203580047d	Defined some AVX functions	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c884c738b1	Updated some commands to match the standard	2019-08-07 16:35:24 +03:00
Reima Hyvönen	b412ed2f59	Removed some setr and used loads calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c6cc063534	converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract	2019-08-07 16:35:24 +03:00
Reima Hyvönen	47ac109b10	optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND	2019-08-07 16:35:24 +03:00
Reima Hyvönen	96dc60a1ed	first working optimation	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c148aff9fb	Some optimation done to function sao_reconstruct_color_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	bf16ba6cc4	Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	79dc39a676	Some editing for sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	06ee52924e	some reconst done to calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	5fbc65d823	reconst optimation doesn't work yet	2019-08-07 16:35:24 +03:00
Reima Hyvönen	d29f834a69	Remove useless function	2019-08-07 16:35:24 +03:00
Reima Hyvönen	a232a12160	calc_sao_edge_dir_avx2 updated	2019-08-07 16:35:24 +03:00
Reima Hyvönen	b1febc02a5	sao_edge_ddistortion_avx2 now working proberly	2019-08-07 16:35:24 +03:00
Reima Hyvönen	cd6092a1ec	Still too much bits, looking for where they appear	2019-08-07 16:35:24 +03:00
Reima Hyvönen	7853be8eeb	Incomple optimation	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	8d48bee180	Tidy fast coeff cost code	2019-07-09 18:01:54 +03:00
Pauli Oikkonen	201a43b08e	Clean up the RD-estimation code	2019-07-09 18:01:54 +03:00
Pauli Oikkonen	b111df5073	Create preliminary version of improved cost estimator	2019-07-09 18:01:54 +03:00
Marko Viitanen	8280bd3217	Add channel info to angular_pred and fix the displacement tables. Also includes 4-tap intra filtering code commented out	2019-07-04 09:35:47 +03:00
Pauli Oikkonen	081d16fc33	Fix intrinsics that may be missing on some systems Create a header to collect all the workarounds for missing intrinsics in one place	2019-05-23 19:59:40 +03:00
Marko Viitanen	30a8a7b97c	WIP fixing the last significant xy coding	2019-05-07 15:01:02 +03:00
Pauli Oikkonen	7175d20bb2	Still include stdint.h for non-vector builds	2019-04-15 19:36:01 +03:00
Pauli Oikkonen	1315c7e2b0	Do not compile any vector code for non-SSE4/AVX2 builds	2019-04-15 19:10:48 +03:00
Pauli Oikkonen	f5f70e7bc5	Merge branch 'sad-optimization'	2019-04-15 19:02:01 +03:00
Pauli Oikkonen	6d43759604	Create a border-respecting 32-wide AVX hor_sad	2019-03-07 18:01:22 +02:00
Pauli Oikkonen	f218cecb38	Remove offending hor_sad_avx2_w32 function Consider possibly creating a non-offending AVX2 version instead, the way hor_sad_sse41_w32 works. Or maybe there's more essential work to do.	2019-03-05 22:51:41 +02:00
Pauli Oikkonen	bcd9879359	Include quant coeff range check in non-scaling list execution path too	2019-02-27 17:26:44 +02:00
Pauli Oikkonen	24e6363f64	Remove the kvz_quant_avx2 wrapper function	2019-02-27 16:32:58 +02:00
Pauli Oikkonen	748820f3c5	Eliminate unnecessary loading of coeffs if scaling lists are off	2019-02-27 16:26:35 +02:00
Pauli Oikkonen	5994350f40	Allow quant_flat_avx2 to be used with scaling lists on	2019-02-27 16:25:59 +02:00
Pauli Oikkonen	d8b8923028	Add LGPL notices to reg_sad headers	2019-02-18 17:52:47 +02:00
Pauli Oikkonen	2d05ca8520	Remove width from constant-width hor_sad func params They should kinda know it already	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	dd7d989a39	Implement 32-wide hor_sad on AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f5ff4db01f	4-wide hor_sad border agnostic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	35e7f9a700	Fix hor_sad w8 to work with both borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	836783dd6e	Use hor_sad_w32 for both left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	69687c8d24	Modify hor_sad_sse41_w16 to work over left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	768203a2de	First version of arbitrary-width SSE4.1 hor_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ccf683b9b6	Start work on left and right border aware hor_sad Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point investigate if this can start to thrash icache	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f781dc31f0	Create strategy for ver_sad Easy to vectorize	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91cb0fbd45	Create strategy for directly obtaining pointer to constant-width SAD function	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	cbca3347b5	Unroll 64-wide AVX2 SAD by 2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	84cf771dea	Unroll 32 and 16 wide SAD vector implementations by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a711ce3df5	Inline fixed width vectorized SAD functions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	6504145cce	Remove 16-pixel wide AVX2 SAD implementation At least on Skylake, it's noticeably slower than the very simple version using SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4cb371184b	Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	796568d9cc	Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4d45d828fa	Use constant-width SSE4.1 SAD funcs for AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a13fc51003	Include a blank AVX2 strategy registration function even in non-AVX2 builds	2019-02-04 19:52:24 +02:00
Pauli Oikkonen	d55414db66	Only build AVX2 coeff encoding when supported ..whoops	2019-02-04 19:34:30 +02:00
Pauli Oikkonen	3fe2f29456	Merge branch 'encode-coeffs-avx2'	2019-02-04 18:52:31 +02:00
Pauli Oikkonen	722b738888	Fix more naming issues	2019-02-04 16:05:43 +02:00
Pauli Oikkonen	e26d98fb75	Rename a couple variables and add crucial comments	2019-02-04 15:57:07 +02:00
Pauli Oikkonen	f186455619	Move encode_last_significant_xy out of strategy modules It's the exact same in both AVX2 and generic, and does not seem to be worth even trying to vectorize	2019-02-04 14:55:41 +02:00
Pauli Oikkonen	3f7340c932	Fine-tune pack_16x16b_to_16x2b Avoid mm_set1 operation when it's possible to create the constant with one bit-shift operation from another instead. Thanks Intel for 3-operand instruction encoding!	2019-02-04 14:44:47 +02:00
Pauli Oikkonen	314f5b0e1f	Rename 16x2b cmpgt function, comment it better, optimize it slightly Eliminate an unnecessary bit masking to make it even more messy	2019-02-04 14:44:32 +02:00
Pauli Oikkonen	d8ff6a6459	Fix _andn_u32 to work on old Visual Studio	2019-02-01 15:34:42 +02:00
Pauli Oikkonen	3a1f2eb752	Prefer SSE4.1 implementation of SAD over AVX2 It seems that the 128-bit wide version consistently outperforms the 256-bit one	2019-01-10 13:48:55 +02:00
Pauli Oikkonen	9b24d81c6a	Use SSE instead of AVX for small widths Highly dubious if this will help performance at all	2019-01-07 20:12:13 +02:00
Pauli Oikkonen	887d7700a8	Modify AVX2 SAD to mask data by byte granularity in AVX registers Avoids using any SAD calculations narrower than 256 bits, and simplifies the code. Also improves execution speed	2019-01-07 18:53:15 +02:00
Pauli Oikkonen	7585f79a71	AVX2-ize SAD calculation Performance is no better than SSE though	2019-01-07 16:26:24 +02:00
Pauli Oikkonen	ab3dc58df6	Copy SAD SSE4.1 impl to AVX2	2019-01-03 18:31:57 +02:00
Pauli Oikkonen	45ac6e6d03	Tidy pack_16x16b_to_16x2b comments	2019-01-03 16:37:05 +02:00
Pauli Oikkonen	016eb014ad	Move packing 16x16b -> 16x2b into separate function	2018-12-20 10:51:44 +02:00
Ari Lemmetti	b234897e8a	Fix smp and amp blocks in fme and revert previous change. Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc. Calculate SATD on the 8x4, ... part	2018-12-19 21:30:53 +02:00
Pauli Oikkonen	9aaa6f260d	Fixes to enable portability	2018-12-18 20:42:09 +02:00
Pauli Oikkonen	2fdbbe9730	Move CG reordering code from quant-avx2 to shared header	2018-12-18 19:42:18 +02:00
Pauli Oikkonen	d02207306d	Create a header file for shared AVX2 code	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	361bf0c7db	Precompute >=2 coeff encoding loop with 2-bit arithmetic Who needs 16x16b vectors when you can do practically the same with 16x2b pseudovectors in 32-bit general purpose registers!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	f66cb23d5b	Optimize greater1 encoding loop Calculating the c1 variable need not be a serial operation!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	8c8b791c35	Vectorize kvz_context_get_sig_ctx_inc	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	033261eb74	Eliminate two branches using bit magic	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c4434e8d04	Scan CG's in forward order to simplify finding last significant	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	efd097f5a5	Vectorize the coeff group loop to some extent	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	a01362e638	use the efficient method of reordering raster->scan	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	50a888e789	Use the efficient method to find first and last nz coeffs in block	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	7e9203f566	Scan coeff groups in scan order to help find last significant one	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	9a5a6fdbc7	Simplify two ifs in encode_coeff_nxn-avx2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	37a2a8bac8	See if loop can be optimized by rearranging	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	584f2f74b6	Vectorize significant coeff group scanning loop	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	1bfed73221	Add AVX2 strategy for encode_coding_tree	2018-12-18 19:41:09 +02:00
Reima Hyvönen	1fcc5c6a8d	Merge branch 'bipred_recon'	2018-12-11 09:59:35 +02:00
Reima Hyvönen	e4a10880f3	Added case 12 to bipred_recon no mov	2018-12-11 09:52:17 +02:00
Marko Viitanen	a4f3968e52	Fix Visual Studio errors by initializing some variables used in AVX2 signhiding	2018-12-11 09:33:26 +02:00
Pauli Oikkonen	c465578048	Add a descriptive comment to coefficient reordering	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	f78bf2ebcb	Optimize q_coefs usage for indexed fetch	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	d9591f1b49	Eliminate midway buffering of reordered coefs TODO: For some mysterious reason seems slightly slower than the buffered one	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7fe454c51f	Optimize get_cheapest_alternative()	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	6bbd3e5a44	Optimize rearrange_512 function	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	cb8209d1b3	Vectorize transform coefficient reordering loop	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7cf4c7ae5f	Rename "reduce" functions to hsum That's what the functions fundamendally do anyway	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	316cd8a846	Fix ALIGNED keyword and grow alignment to 64B	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	1befc69a4c	Implement sign bit hiding in AVX2	2018-12-03 15:36:32 +02:00
Reima Hyvönen	f8696b54a4	Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)	2018-11-20 17:09:19 +02:00
Reima Hyvönen	710ba288db	Chroma has some problems	2018-11-15 16:42:48 +02:00
Ari Lemmetti	a832206bb6	Replace 32-bit incompatible instrinsics	2018-11-12 18:54:33 +02:00
Ari Lemmetti	5c774c4105	Rewrite most of FME and interpolation filters Changes had to break a lot of stuff and were just squashed into this horrible code dump	2018-11-08 20:21:16 +02:00
Reima Hyvönen	7406c33a42	Some more cleaning	2018-10-26 12:25:18 +03:00
Reima Hyvönen	4c71546b2e	Cleaned some coding	2018-10-26 12:19:44 +03:00
Reima Hyvönen	4fe3909e48	Switched luma to use 32bits size ints intstead of 16bit size	2018-10-24 18:24:46 +03:00
Marko Viitanen	e015d7eb2b	Fix compiler warnings	2018-10-17 10:43:11 +03:00
Reima Hyvönen	381e786e10	Trying to find the bug in luma	2018-10-11 18:08:41 +03:00
Reima Hyvönen	2f5f81bac3	removed the non-optimated bipred function	2018-10-09 11:19:23 +03:00
Reima Hyvönen	212a8e68fa	Modified to avoid memory overflow, still some bug inside luma	2018-10-02 20:23:32 +03:00
Reima Hyvönen	896034b7cf	Some renamed functions back	2018-08-28 15:31:10 +03:00
Reima Hyvönen	7de5c74434	Updated bipred_recon to work faster	2018-08-28 15:12:31 +03:00
Reima Hyvönen	2ca99a44e8	Updated shuffle operation to be in right order	2018-08-27 18:16:38 +03:00
Marko Viitanen	4f7da86285	Commented out sign hiding code, which is not used in VVC	2018-08-17 09:38:11 +03:00
Reima Hyvönen	508b218a12	some modifications made to prevent reading too much	2018-08-14 10:50:39 +03:00
Reima Hyvönen	1d935ee888	some useless stuff removed	2018-08-13 16:47:11 +03:00
Reima Hyvönen	ce3ac4c05e	some modifications to no_mov	2018-08-13 16:41:02 +03:00
Reima Hyvönen	15a613ae94	test if no_mov breaks testing	2018-08-13 16:02:56 +03:00
Reima Hyvönen	97a2049e58	removed pointer declaration out from switch	2018-08-10 16:42:26 +03:00
Reima Hyvönen	aa94bcedbc	Stream is now pointer	2018-08-10 16:38:49 +03:00
Reima Hyvönen	fa5b227ece	256 to 32 doesn't work, made them by hand	2018-08-10 16:01:20 +03:00
Reima Hyvönen	408dedbcc8	removed _mm256_extract_epi8 and replaced with _mm_stream	2018-08-10 15:53:26 +03:00
Reima Hyvönen	31c35091c6	_mm256_cvtsi256_si32 removed	2018-08-10 10:06:40 +03:00
Reima Hyvönen	99dc43074f	_mm256_cvtsi256_si32 breaks system, too much bits. back to extract	2018-08-10 09:59:33 +03:00
Reima Hyvönen	4f1f80b2cb	Transformed convert from 256 to cast 256 -> 128 and then convert from 128	2018-08-09 15:35:54 +03:00
Reima Hyvönen	4957555eb3	Removed leftover from 939	2018-08-09 15:25:03 +03:00
Reima Hyvönen	28b165c971	Clearified some sections, added _MM_SHUFFLE macro	2018-08-09 15:23:01 +03:00
Reima Hyvönen	dd04df8667	testing if error in both avx2 functions	2018-08-03 11:49:00 +03:00
Reima Hyvönen	ed50d71fde	Switched some variables to different location, altered inter_recon_bipred_avx2 function	2018-08-02 16:08:59 +03:00
Reima Hyvönen	f5739a0028	Renaming and removing useless prints	2018-08-02 14:47:17 +03:00

1 2 3 4 5 ...

384 commits