Commit graph

3173 commits

Author SHA1 Message Date
Pauli Oikkonen 2b95d9cdd6 Align all DCT test buffers to 32 bytes
Now that most AVX2 DCTs use MOVDQA instead of MOVDQU, also adapt the
tests to that..
2019-10-28 16:19:42 +02:00
Pauli Oikkonen ca9409de2b Implement 16x16 DCT as butterfly algorithm in AVX2 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 7c69a26717 Use aligned loads and stores for AVX2 DCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 8e9c65dca6 Align DCT matrices and temp transform buffers 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 148a150522 Align DCT source and dest blocks to cache line 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 8e60bbf6a6 Slightly tune 16x16 forward DCT
Use an array of __m256i's to store temporary value, essentially letting
the compiler enforce alignment and use aligned loads and stores.
2019-10-28 16:19:42 +02:00
Pauli Oikkonen c0cc0e8a75 Optimize 16x16 multiply by only slicing right mat once 2019-10-28 16:19:42 +02:00
Pauli Oikkonen e463d27f22 Implement streamlined generic 16x16 matrix multiply
It can't be this fast for real, can it?
2019-10-28 16:19:42 +02:00
Pauli Oikkonen beb85ce9d6 Reorder parameters for 8x8 matrix multiplies 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 292af62256 Implement tailored 16x16 forward DCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 30ce461d98 Redo 4x4 matrix multiplication 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 07970ea82f Streamline by-the-book 8x8 matrix multiplication
Also chop up the forward transform into two tailored multiply functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen 7ec7ab3361 Implement a tailored AVX2 8x8 DCT 2019-10-28 16:19:42 +02:00
Joose Sainio 372934c7db Fix division by zero 2019-10-10 16:35:56 +03:00
Joose Sainio 9bdfdeaf5c Rest of the owl 2019-10-09 15:48:58 +03:00
Joose Sainio 1ba8525faf WIP 2019-10-09 10:35:07 +03:00
Joose Sainio 19496d2692 ? 2019-10-03 14:50:11 +03:00
Joose Sainio 4b111e339e fix couple of bugs in the implementation, bit calculation seems still bit off 2019-10-01 15:08:39 +03:00
Joose Sainio 84615e406a fix compiler warnings 2019-09-27 14:20:08 +03:00
Joose Sainio 14b7a75713 Call the new functions and fix bugs 2019-09-27 14:14:24 +03:00
Joose Sainio ef74bfb182 unify naming 2019-09-27 10:16:21 +03:00
Joose Sainio e36f481bda qp calculation for frame 2019-09-27 09:05:40 +03:00
Joose Sainio 47019ca1cd intra ck update 2019-09-26 16:04:53 +03:00
Joose Sainio 7c8f4da7cb Update c and k except after first intra 2019-09-26 13:09:28 +03:00
Joose Sainio 0577d481c1 CTU level code 2019-09-25 12:12:21 +03:00
Marko Viitanen ad7c8d40bc
Merge pull request #247 from pkubaj/master
Fix build on powerpc64 with LLVM
2019-09-12 16:11:19 +03:00
pkubaj 1d7fcf4227
Fix build on powerpc64 with LLVM 2019-09-12 15:05:00 +02:00
mercat 0de567bfa4 Fixe memory leak 2019-09-12 09:45:32 +03:00
mercat fa116de619 Add static 2019-09-11 16:18:12 +03:00
mercat 5cb2fbba16 Merge branch 'ML-cplx_red_ICIP' of gitlab.tut.fi:TIE/ultravideo/kvazaar into ML-cplx_red_ICIP 2019-09-11 16:12:47 +03:00
mercat b8753a9293 Fucking INLINE fixed 2019-09-11 16:12:07 +03:00
mercat b855144e68 INLINE fixe 2019-09-11 16:12:07 +03:00
mercat 694337b803 Add const and more const 2019-09-11 16:12:07 +03:00
mercat 21c07638ed Remove const into kvz_init_constraint. 2019-09-11 16:12:06 +03:00
mercat 2bca507abe Clean version of machine learning constraint code. (ICIP paper) 2019-09-11 16:12:06 +03:00
Alexandre Mercat 0f4b7be6ee First version of ML ICIP code for master 2019-09-11 16:12:06 +03:00
mercat 808eb4ff96 Fucking INLINE fixed 2019-09-11 16:08:31 +03:00
mercat 35fd556321 INLINE fixe 2019-09-11 16:05:31 +03:00
mercat 5beb23d91c Add const and more const 2019-09-11 16:03:03 +03:00
mercat 6cda8036c9 Remove const into kvz_init_constraint. 2019-09-11 15:57:15 +03:00
mercat 1dac29d9a0 Clean version of machine learning constraint code. (ICIP paper) 2019-09-11 15:49:56 +03:00
Marko Viitanen 4007485420 Update the ffmpeg version used in the tests 2019-09-11 14:52:30 +03:00
Marko Viitanen da5dca057d Change libtool path in tests to fix travis builds 2019-09-11 09:33:43 +03:00
Pauli Oikkonen 99597b828a Work around the ancient Win32 calling convention hassle
See if this'll work now
2019-09-06 13:14:42 +03:00
Pauli Oikkonen c5ca18950c Revert "Revert to 6924d90052 due to broken visual studio build"
This reverts commit 1dd0619bd7.
2019-09-05 18:21:55 +03:00
Pauli Oikkonen 55529decd5 Implement _mm256_insert_epi32 and extract pseudo-ops
Visual Studio headers apparently lack these guys
2019-09-05 18:20:52 +03:00
Ari Lemmetti 4e94d60552 Merge branch 'smp-merge-analysis' 2019-09-03 16:47:07 +03:00
Ari Lemmetti 147378e1f9 Prevent 8x4 and 4x8 bipred in merge analysis 2019-09-03 16:32:50 +03:00
Ari Lemmetti ef1fdbf259 Separate prediction of single PU/PB from CU/CB 2019-09-03 16:32:50 +03:00
Joose Sainio 7d2737bdf6 WIP picture lambda calculation 2019-09-03 11:03:35 +03:00