hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 02:24:07 +00:00

Author	SHA1	Message	Date
Marko Viitanen	7a8641b002	Merge pull request #224 from jbeich/powerpc Switch AltiVec on Linux to getauxval()	2019-04-08 08:24:12 +03:00
Jan Beich	85f46e17a9	Detect AltiVec via elf_aux_info() on FreeBSD 12+	2019-04-01 13:08:04 +00:00
Jan Beich	82486255da	Simplify AltiVec detection on Linux	2019-04-01 13:08:04 +00:00
Eemeli Kallio	329c72a485	Changed tab from README to spaces	2019-03-13 16:24:32 +02:00
Eemeli Kallio	48e83ece9e	Updated --max-merge to README	2019-03-13 15:28:10 +02:00
Eemeli Kallio	2ce1ef25c5	Fixed project files that were changed in merge	2019-03-05 14:51:36 +02:00
Eemeli Kallio	c159e275b7	Merge branch 'max_merge'	2019-03-05 14:39:03 +02:00
Pauli Oikkonen	2e98b57b73	Merge remote-tracking branch 'origin/quant-avx2-scaling-lists'	2019-03-04 19:13:48 +02:00
Pauli Oikkonen	bcd9879359	Include quant coeff range check in non-scaling list execution path too	2019-02-27 17:26:44 +02:00
Pauli Oikkonen	24e6363f64	Remove the kvz_quant_avx2 wrapper function	2019-02-27 16:32:58 +02:00
Pauli Oikkonen	748820f3c5	Eliminate unnecessary loading of coeffs if scaling lists are off	2019-02-27 16:26:35 +02:00
Pauli Oikkonen	5994350f40	Allow quant_flat_avx2 to be used with scaling lists on	2019-02-27 16:25:59 +02:00
Eemeli Kallio	7f4e0acf41	Added check if max-merge is out of bounds	2019-02-19 13:53:42 +02:00
Eemeli Kallio	2a40560888	some variables to const	2019-02-12 11:24:10 +02:00
Eemeli Kallio	8f8e7bb53c	Added possibility to reduce number of maximum number of merge candidates.	2019-02-12 09:21:03 +02:00
Pauli Oikkonen	a13fc51003	Include a blank AVX2 strategy registration function even in non-AVX2 builds	2019-02-04 19:52:24 +02:00
Pauli Oikkonen	d55414db66	Only build AVX2 coeff encoding when supported ..whoops	2019-02-04 19:34:30 +02:00
Pauli Oikkonen	3fe2f29456	Merge branch 'encode-coeffs-avx2'	2019-02-04 18:52:31 +02:00
Pauli Oikkonen	722b738888	Fix more naming issues	2019-02-04 16:05:43 +02:00
Pauli Oikkonen	e26d98fb75	Rename a couple variables and add crucial comments	2019-02-04 15:57:07 +02:00
Pauli Oikkonen	f186455619	Move encode_last_significant_xy out of strategy modules It's the exact same in both AVX2 and generic, and does not seem to be worth even trying to vectorize	2019-02-04 14:55:41 +02:00
Pauli Oikkonen	3f7340c932	Fine-tune pack_16x16b_to_16x2b Avoid mm_set1 operation when it's possible to create the constant with one bit-shift operation from another instead. Thanks Intel for 3-operand instruction encoding!	2019-02-04 14:44:47 +02:00
Pauli Oikkonen	314f5b0e1f	Rename 16x2b cmpgt function, comment it better, optimize it slightly Eliminate an unnecessary bit masking to make it even more messy	2019-02-04 14:44:32 +02:00
Pauli Oikkonen	d8ff6a6459	Fix _andn_u32 to work on old Visual Studio	2019-02-01 15:34:42 +02:00
Pauli Oikkonen	bed93fb7f5	Merge branch 'sad-avx2'	2019-01-10 17:48:09 +02:00
Pauli Oikkonen	26e1b2c783	Use (u)int32_t instead of (unsigned) int in reg_sad_sse41	2019-01-10 14:37:04 +02:00
Pauli Oikkonen	3a1f2eb752	Prefer SSE4.1 implementation of SAD over AVX2 It seems that the 128-bit wide version consistently outperforms the 256-bit one	2019-01-10 13:48:55 +02:00
Pauli Oikkonen	9b24d81c6a	Use SSE instead of AVX for small widths Highly dubious if this will help performance at all	2019-01-07 20:12:13 +02:00
Pauli Oikkonen	b2176bf72a	Optimize SSE4.1 version of SAD Make it use the same vblend trick as AVX2. Interestingly, on my test setup this seems to be faster than the same code using 256-bit AVX vectors.	2019-01-07 19:40:57 +02:00
Pauli Oikkonen	887d7700a8	Modify AVX2 SAD to mask data by byte granularity in AVX registers Avoids using any SAD calculations narrower than 256 bits, and simplifies the code. Also improves execution speed	2019-01-07 18:53:15 +02:00
Pauli Oikkonen	7585f79a71	AVX2-ize SAD calculation Performance is no better than SSE though	2019-01-07 16:26:24 +02:00
Pauli Oikkonen	ab3dc58df6	Copy SAD SSE4.1 impl to AVX2	2019-01-03 18:31:57 +02:00
Pauli Oikkonen	45ac6e6d03	Tidy pack_16x16b_to_16x2b comments	2019-01-03 16:37:05 +02:00
Ari Lemmetti	cd818db724	Add missing quantization and residual in cost calculation (inter rd=2).	2018-12-21 15:55:29 +02:00
Pauli Oikkonen	016eb014ad	Move packing 16x16b -> 16x2b into separate function	2018-12-20 10:51:44 +02:00
Ari Lemmetti	b234897e8a	Fix smp and amp blocks in fme and revert previous change. Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc. Calculate SATD on the 8x4, ... part	2018-12-19 21:30:53 +02:00
Pauli Oikkonen	3b635309a1	Add new files to Visual Studio project	2018-12-18 20:48:41 +02:00
Pauli Oikkonen	9aaa6f260d	Fixes to enable portability	2018-12-18 20:42:09 +02:00
Pauli Oikkonen	2fdbbe9730	Move CG reordering code from quant-avx2 to shared header	2018-12-18 19:42:18 +02:00
Pauli Oikkonen	d02207306d	Create a header file for shared AVX2 code	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	361bf0c7db	Precompute >=2 coeff encoding loop with 2-bit arithmetic Who needs 16x16b vectors when you can do practically the same with 16x2b pseudovectors in 32-bit general purpose registers!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	940b0e9e6a	Require BMI2 for AVX2 build Any processor implementing AVX2 should also implement BMI2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	f66cb23d5b	Optimize greater1 encoding loop Calculating the c1 variable need not be a serial operation!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	8c8b791c35	Vectorize kvz_context_get_sig_ctx_inc	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	033261eb74	Eliminate two branches using bit magic	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c4434e8d04	Scan CG's in forward order to simplify finding last significant	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	efd097f5a5	Vectorize the coeff group loop to some extent	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	a01362e638	use the efficient method of reordering raster->scan	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	50a888e789	Use the efficient method to find first and last nz coeffs in block	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	7e9203f566	Scan coeff groups in scan order to help find last significant one	2018-12-18 19:41:09 +02:00

1 2 3 4 5 ...

2772 commits