hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 02:24:07 +00:00

Author	SHA1	Message	Date
Pauli Oikkonen	169314de4f	32x32 filtered DC prediction in AVX2	2019-12-11 18:17:06 +02:00
Pauli Oikkonen	fb2481b7e4	16x16 filtered DC implemented in AVX2	2019-12-10 15:54:50 +02:00
Pauli Oikkonen	da370ea36d	Implement AVX2 8x8 filtered DC algorithm	2019-11-28 14:10:10 +02:00
Pauli Oikkonen	5d9b7019ca	Implement a 4x4 filtered DC pred function	2019-11-26 17:05:54 +02:00
Pauli Oikkonen	f1485ab087	Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?	2019-11-25 15:20:29 +02:00
Pauli Oikkonen	979d66031c	Create a strategy out of intra_pred_filtered_dc	2019-11-19 14:50:31 +02:00
Pauli Oikkonen	fa4bb86406	Optimize intra_pred_planar_avx2 for 4x4 blocks	2019-11-19 13:39:02 +02:00
Pauli Oikkonen	4761d228f9	Start to vectorize the 4x4 loop	2019-11-15 17:32:40 +02:00
Pauli Oikkonen	8d45ab4951	Stupidify the 4x4 planar loop for vectorization	2019-11-14 17:14:04 +02:00
Pauli Oikkonen	6f13f6525c	Merge branch 'new_prints'	2019-11-07 17:04:21 +02:00
mercat	57e8c3ebc2	Merge branch 'ML-cplx_red_ICIP'	2019-11-07 13:25:47 +02:00
Pauli Oikkonen	558f0ec401	Mbps, not mbps	2019-11-05 18:06:00 +02:00
Pauli Oikkonen	2edf533925	Tidy the end report printing Also fix a bug with non-integer target FPS	2019-11-05 17:20:00 +02:00
Pauli Oikkonen	c7313ce567	Store AVG QP information in encmain	2019-11-04 17:08:07 +02:00
Reima Hyvönen	80575c59bf	Some updates done to get right bitrate and avg QP	2019-10-31 15:56:24 +02:00
Reima Hyvönen	252bab8820	Added prints to bitrate and AVG QP	2019-10-31 15:56:24 +02:00
Pauli Oikkonen	6d7a4f555c	Also remove 16x16 (A * B^T)^T matrix multiply Can be done using (B * A^T) instead, it's the exact same	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	2c2deb2366	Tidy AVX2 32x32 matrix multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	98ad78b333	Tidy the old AVX2 32x32 matrix multiply It was actually a very good algorithm, just looked messy!	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	4a921cbdb5	Retain data as much in YMM registers as possible This seems to make it a whole lot quicker	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	9589baccac	Add more swap filename patterns to .gitignore	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ac4d710e23	Unroll 32x32 matrix multiply, use all regs	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	a58608d0b8	Remove totally unnecessary (A * B^T)^T 32x32 multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	043f53539f	Implement a streamlined matrix-multiply 32x32 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e9da2d851b	Tidy 32x32 fast DCT's helper functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e382339182	Implement fast (butterfly) 32x32 DCT in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	b5962dadac	Tidy indentation in AVX2 16x16 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	36a8f89025	Fine-tune 16x16 AVX2 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	2b95d9cdd6	Align all DCT test buffers to 32 bytes Now that most AVX2 DCTs use MOVDQA instead of MOVDQU, also adapt the tests to that..	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ca9409de2b	Implement 16x16 DCT as butterfly algorithm in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7c69a26717	Use aligned loads and stores for AVX2 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e9c65dca6	Align DCT matrices and temp transform buffers	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	148a150522	Align DCT source and dest blocks to cache line	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e60bbf6a6	Slightly tune 16x16 forward DCT Use an array of __m256i's to store temporary value, essentially letting the compiler enforce alignment and use aligned loads and stores.	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	c0cc0e8a75	Optimize 16x16 multiply by only slicing right mat once	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e463d27f22	Implement streamlined generic 16x16 matrix multiply It can't be this fast for real, can it?	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	beb85ce9d6	Reorder parameters for 8x8 matrix multiplies	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	292af62256	Implement tailored 16x16 forward DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	30ce461d98	Redo 4x4 matrix multiplication	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	07970ea82f	Streamline by-the-book 8x8 matrix multiplication Also chop up the forward transform into two tailored multiply functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7ec7ab3361	Implement a tailored AVX2 8x8 DCT	2019-10-28 16:19:42 +02:00
Marko Viitanen	ad7c8d40bc	Merge pull request #247 from pkubaj/master Fix build on powerpc64 with LLVM	2019-09-12 16:11:19 +03:00
pkubaj	1d7fcf4227	Fix build on powerpc64 with LLVM	2019-09-12 15:05:00 +02:00
mercat	0de567bfa4	Fixe memory leak	2019-09-12 09:45:32 +03:00
mercat	fa116de619	Add static	2019-09-11 16:18:12 +03:00
mercat	5cb2fbba16	Merge branch 'ML-cplx_red_ICIP' of gitlab.tut.fi:TIE/ultravideo/kvazaar into ML-cplx_red_ICIP	2019-09-11 16:12:47 +03:00
mercat	b8753a9293	Fucking INLINE fixed	2019-09-11 16:12:07 +03:00
mercat	b855144e68	INLINE fixe	2019-09-11 16:12:07 +03:00
mercat	694337b803	Add const and more const	2019-09-11 16:12:07 +03:00
mercat	21c07638ed	Remove const into kvz_init_constraint.	2019-09-11 16:12:06 +03:00

1 2 3 4 5 ...

2986 commits