hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 02:24:07 +00:00

Author	SHA1	Message	Date
mercat	57e8c3ebc2	Merge branch 'ML-cplx_red_ICIP'	2019-11-07 13:25:47 +02:00
Pauli Oikkonen	6d7a4f555c	Also remove 16x16 (A * B^T)^T matrix multiply Can be done using (B * A^T) instead, it's the exact same	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	2c2deb2366	Tidy AVX2 32x32 matrix multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	98ad78b333	Tidy the old AVX2 32x32 matrix multiply It was actually a very good algorithm, just looked messy!	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	4a921cbdb5	Retain data as much in YMM registers as possible This seems to make it a whole lot quicker	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	9589baccac	Add more swap filename patterns to .gitignore	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ac4d710e23	Unroll 32x32 matrix multiply, use all regs	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	a58608d0b8	Remove totally unnecessary (A * B^T)^T 32x32 multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	043f53539f	Implement a streamlined matrix-multiply 32x32 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e9da2d851b	Tidy 32x32 fast DCT's helper functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e382339182	Implement fast (butterfly) 32x32 DCT in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	b5962dadac	Tidy indentation in AVX2 16x16 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	36a8f89025	Fine-tune 16x16 AVX2 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	2b95d9cdd6	Align all DCT test buffers to 32 bytes Now that most AVX2 DCTs use MOVDQA instead of MOVDQU, also adapt the tests to that..	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ca9409de2b	Implement 16x16 DCT as butterfly algorithm in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7c69a26717	Use aligned loads and stores for AVX2 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e9c65dca6	Align DCT matrices and temp transform buffers	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	148a150522	Align DCT source and dest blocks to cache line	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e60bbf6a6	Slightly tune 16x16 forward DCT Use an array of __m256i's to store temporary value, essentially letting the compiler enforce alignment and use aligned loads and stores.	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	c0cc0e8a75	Optimize 16x16 multiply by only slicing right mat once	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e463d27f22	Implement streamlined generic 16x16 matrix multiply It can't be this fast for real, can it?	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	beb85ce9d6	Reorder parameters for 8x8 matrix multiplies	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	292af62256	Implement tailored 16x16 forward DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	30ce461d98	Redo 4x4 matrix multiplication	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	07970ea82f	Streamline by-the-book 8x8 matrix multiplication Also chop up the forward transform into two tailored multiply functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7ec7ab3361	Implement a tailored AVX2 8x8 DCT	2019-10-28 16:19:42 +02:00
Marko Viitanen	ad7c8d40bc	Merge pull request #247 from pkubaj/master Fix build on powerpc64 with LLVM	2019-09-12 16:11:19 +03:00
pkubaj	1d7fcf4227	Fix build on powerpc64 with LLVM	2019-09-12 15:05:00 +02:00
mercat	0de567bfa4	Fixe memory leak	2019-09-12 09:45:32 +03:00
mercat	fa116de619	Add static	2019-09-11 16:18:12 +03:00
mercat	5cb2fbba16	Merge branch 'ML-cplx_red_ICIP' of gitlab.tut.fi:TIE/ultravideo/kvazaar into ML-cplx_red_ICIP	2019-09-11 16:12:47 +03:00
mercat	b8753a9293	Fucking INLINE fixed	2019-09-11 16:12:07 +03:00
mercat	b855144e68	INLINE fixe	2019-09-11 16:12:07 +03:00
mercat	694337b803	Add const and more const	2019-09-11 16:12:07 +03:00
mercat	21c07638ed	Remove const into kvz_init_constraint.	2019-09-11 16:12:06 +03:00
mercat	2bca507abe	Clean version of machine learning constraint code. (ICIP paper)	2019-09-11 16:12:06 +03:00
Alexandre Mercat	0f4b7be6ee	First version of ML ICIP code for master	2019-09-11 16:12:06 +03:00
mercat	808eb4ff96	Fucking INLINE fixed	2019-09-11 16:08:31 +03:00
mercat	35fd556321	INLINE fixe	2019-09-11 16:05:31 +03:00
mercat	5beb23d91c	Add const and more const	2019-09-11 16:03:03 +03:00
mercat	6cda8036c9	Remove const into kvz_init_constraint.	2019-09-11 15:57:15 +03:00
mercat	1dac29d9a0	Clean version of machine learning constraint code. (ICIP paper)	2019-09-11 15:49:56 +03:00
Marko Viitanen	4007485420	Update the ffmpeg version used in the tests	2019-09-11 14:52:30 +03:00
Marko Viitanen	da5dca057d	Change libtool path in tests to fix travis builds	2019-09-11 09:33:43 +03:00
Pauli Oikkonen	99597b828a	Work around the ancient Win32 calling convention hassle See if this'll work now	2019-09-06 13:14:42 +03:00
Pauli Oikkonen	c5ca18950c	Revert "Revert to `6924d90052` due to broken visual studio build" This reverts commit `1dd0619bd7`.	2019-09-05 18:21:55 +03:00
Pauli Oikkonen	55529decd5	Implement _mm256_insert_epi32 and extract pseudo-ops Visual Studio headers apparently lack these guys	2019-09-05 18:20:52 +03:00
Ari Lemmetti	4e94d60552	Merge branch 'smp-merge-analysis'	2019-09-03 16:47:07 +03:00
Ari Lemmetti	147378e1f9	Prevent 8x4 and 4x8 bipred in merge analysis	2019-09-03 16:32:50 +03:00
Ari Lemmetti	ef1fdbf259	Separate prediction of single PU/PB from CU/CB	2019-09-03 16:32:50 +03:00

1 2 3 4 5 ...

2971 commits