hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-12-04 13:54:05 +00:00

Author	SHA1	Message	Date
Pauli Oikkonen	fa4bb86406	Optimize intra_pred_planar_avx2 for 4x4 blocks	2019-11-19 13:39:02 +02:00
Pauli Oikkonen	4761d228f9	Start to vectorize the 4x4 loop	2019-11-15 17:32:40 +02:00
Pauli Oikkonen	8d45ab4951	Stupidify the 4x4 planar loop for vectorization	2019-11-14 17:14:04 +02:00
Pauli Oikkonen	6d7a4f555c	Also remove 16x16 (A * B^T)^T matrix multiply Can be done using (B * A^T) instead, it's the exact same	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	2c2deb2366	Tidy AVX2 32x32 matrix multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	98ad78b333	Tidy the old AVX2 32x32 matrix multiply It was actually a very good algorithm, just looked messy!	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	4a921cbdb5	Retain data as much in YMM registers as possible This seems to make it a whole lot quicker	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ac4d710e23	Unroll 32x32 matrix multiply, use all regs	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	a58608d0b8	Remove totally unnecessary (A * B^T)^T 32x32 multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	043f53539f	Implement a streamlined matrix-multiply 32x32 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e9da2d851b	Tidy 32x32 fast DCT's helper functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e382339182	Implement fast (butterfly) 32x32 DCT in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	b5962dadac	Tidy indentation in AVX2 16x16 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	36a8f89025	Fine-tune 16x16 AVX2 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ca9409de2b	Implement 16x16 DCT as butterfly algorithm in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7c69a26717	Use aligned loads and stores for AVX2 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e9c65dca6	Align DCT matrices and temp transform buffers	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	148a150522	Align DCT source and dest blocks to cache line	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e60bbf6a6	Slightly tune 16x16 forward DCT Use an array of __m256i's to store temporary value, essentially letting the compiler enforce alignment and use aligned loads and stores.	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	c0cc0e8a75	Optimize 16x16 multiply by only slicing right mat once	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e463d27f22	Implement streamlined generic 16x16 matrix multiply It can't be this fast for real, can it?	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	beb85ce9d6	Reorder parameters for 8x8 matrix multiplies	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	292af62256	Implement tailored 16x16 forward DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	30ce461d98	Redo 4x4 matrix multiplication	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	07970ea82f	Streamline by-the-book 8x8 matrix multiplication Also chop up the forward transform into two tailored multiply functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7ec7ab3361	Implement a tailored AVX2 8x8 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	99597b828a	Work around the ancient Win32 calling convention hassle See if this'll work now	2019-09-06 13:14:42 +03:00
Pauli Oikkonen	c5ca18950c	Revert "Revert to `6924d90052` due to broken visual studio build" This reverts commit `1dd0619bd7`.	2019-09-05 18:21:55 +03:00
Pauli Oikkonen	55529decd5	Implement _mm256_insert_epi32 and extract pseudo-ops Visual Studio headers apparently lack these guys	2019-09-05 18:20:52 +03:00
Ari Lemmetti	557bcbc6aa	Make luma or chroma only inter "recon" or predict possible	2019-09-02 17:15:28 +03:00
Ari Lemmetti	1dd0619bd7	Revert to `6924d90052` due to broken visual studio build	2019-08-08 15:15:34 +03:00
Pauli Oikkonen	2852baa673	Separate sign3_diff_epu8 from calc_eo_cat Just to keep things simple, clear and obvious	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	a858e7dd4b	Combine duplicate code into inline functions	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	de0e97f711	Take 8/16/24b loads and stores into separate functions	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	10979f58fe	Tidy up code	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	9cc11976c0	Combine the delta accumulation from edge and band ddistortion into shared func This won't reduce object size, but there'll be less duplicate code	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	55d877bd66	Vectorize sao_edge_ddistortion	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	aef0f301d3	Fix function signatures Mark anything intended as read-only to be const, and fix alignment	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	997fd369b3	Redo calc_sao_edge_dir_avx2 Do it wider, 32 pixels at once!	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	db1e475e02	Use i32 instead of i8 for x/y offsets Doesn't matter too much, because this number isn't used in SIMD computation, only as a memory reference offset.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	12de466ef5	Reimplement non-band SAO color reconstruction in AVX2 Streamline things to work on 32 pixels at once instead of 8	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	e8bff99329	Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction Vectorize it all, hope this helps with perf	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	7b5dffa855	Implement calc_sao_offset_array in AVX2 To be efficient, the AVX2 color reconstruction algorithm will need offsets in byte, not dword, arrays. This is completely specific to 8-bit pixels and the function signature is fundamentally distinct from the generic algorithm, so it's better to not strategize SAO offset array calculation.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	08881f5e9b	(TEMP) (TODO) (whatever) Avoid compiler warnings I want the CI to not crash on its -Wall -Werror, but instead to actually build the thing and report me about actual memory errors etc	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	c18adc5ee0	Redo sao_band_ddistortion_avx2 Avoid branching and do the entire thing on 32 pixels at once in YMMs. Also make the sao_bands function parameter const.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	1bb9a079a8	Fix indentation	2019-08-07 16:35:24 +03:00
Reima Hyvönen	7bc959c7c5	3 sao functions are now working	2019-08-07 16:35:24 +03:00
Reima Hyvönen	0e0f2d3490	made to clear sum vector after it has been set to memory	2019-08-07 16:35:24 +03:00
Reima Hyvönen	f146de7acb	removed some variables to prevent memory losses	2019-08-07 16:35:24 +03:00
Reima Hyvönen	247c3a7a71	conversed gined to unsigned int	2019-08-07 16:35:24 +03:00

1 2 3 4 5 ...

318 commits