Marko Viitanen
|
17a53230fd
|
Code cleanup, remove unused arrays and remove tabs
|
2019-11-18 09:01:23 +02:00 |
|
Pauli Oikkonen
|
4761d228f9
|
Start to vectorize the 4x4 loop
|
2019-11-15 17:32:40 +02:00 |
|
Pauli Oikkonen
|
8d45ab4951
|
Stupidify the 4x4 planar loop for vectorization
|
2019-11-14 17:14:04 +02:00 |
|
Marko Viitanen
|
91528f3292
|
Update contexts
|
2019-11-14 13:46:51 +02:00 |
|
Marko Viitanen
|
b309ed90be
|
Fix NAL packet and missing fields in SPS
|
2019-11-14 09:21:11 +02:00 |
|
Marko Viitanen
|
74514981a9
|
Fixed PPS, SPS and slice headers and NAL unit types
|
2019-11-13 15:59:36 +02:00 |
|
Joose Sainio
|
c759c138ed
|
Prepare the rc data structure to be shared among all frame encoders
|
2019-11-13 11:56:25 +02:00 |
|
Joose Sainio
|
cdb7c851a4
|
Fix weight calculation
|
2019-11-13 08:55:31 +02:00 |
|
Joose Sainio
|
b9b01f8036
|
WPP with threading
|
2019-11-12 12:12:57 +02:00 |
|
Joose Sainio
|
615973adca
|
should enable threading with wpp when owf is not used
|
2019-11-12 09:03:00 +02:00 |
|
Pauli Oikkonen
|
6f13f6525c
|
Merge branch 'new_prints'
|
2019-11-07 17:04:21 +02:00 |
|
Joose Sainio
|
d353f7dd1a
|
Disable debug prints, fix multiple bugs in the calculation
|
2019-11-07 15:08:57 +02:00 |
|
mercat
|
57e8c3ebc2
|
Merge branch 'ML-cplx_red_ICIP'
|
2019-11-07 13:25:47 +02:00 |
|
Pauli Oikkonen
|
558f0ec401
|
Mbps, not mbps
|
2019-11-05 18:06:00 +02:00 |
|
Pauli Oikkonen
|
2edf533925
|
Tidy the end report printing
Also fix a bug with non-integer target FPS
|
2019-11-05 17:20:00 +02:00 |
|
Joose Sainio
|
408fd4ccb6
|
Fix lambda and qp calcualtion for intra frames
also fixes a bug with selecting the clip neighbor lambda and clip neighbor qp
selection for inter frames
|
2019-11-05 10:51:39 +02:00 |
|
Pauli Oikkonen
|
c7313ce567
|
Store AVG QP information in encmain
|
2019-11-04 17:08:07 +02:00 |
|
Reima Hyvönen
|
80575c59bf
|
Some updates done to get right bitrate and avg QP
|
2019-10-31 15:56:24 +02:00 |
|
Reima Hyvönen
|
252bab8820
|
Added prints to bitrate and AVG QP
|
2019-10-31 15:56:24 +02:00 |
|
Pauli Oikkonen
|
6d7a4f555c
|
Also remove 16x16 (A * B^T)^T matrix multiply
Can be done using (B * A^T) instead, it's the exact same
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
2c2deb2366
|
Tidy AVX2 32x32 matrix multiply
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
98ad78b333
|
Tidy the old AVX2 32x32 matrix multiply
It was actually a very good algorithm, just looked messy!
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
4a921cbdb5
|
Retain data as much in YMM registers as possible
This seems to make it a whole lot quicker
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
9589baccac
|
Add more swap filename patterns to .gitignore
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
ac4d710e23
|
Unroll 32x32 matrix multiply, use all regs
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
a58608d0b8
|
Remove totally unnecessary (A * B^T)^T 32x32 multiply
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
043f53539f
|
Implement a streamlined matrix-multiply 32x32 DCT
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
e9da2d851b
|
Tidy 32x32 fast DCT's helper functions
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
e382339182
|
Implement fast (butterfly) 32x32 DCT in AVX2
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
b5962dadac
|
Tidy indentation in AVX2 16x16 iDCT
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
36a8f89025
|
Fine-tune 16x16 AVX2 iDCT
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
2b95d9cdd6
|
Align all DCT test buffers to 32 bytes
Now that most AVX2 DCTs use MOVDQA instead of MOVDQU, also adapt the
tests to that..
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
ca9409de2b
|
Implement 16x16 DCT as butterfly algorithm in AVX2
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
7c69a26717
|
Use aligned loads and stores for AVX2 DCT
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
8e9c65dca6
|
Align DCT matrices and temp transform buffers
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
148a150522
|
Align DCT source and dest blocks to cache line
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
8e60bbf6a6
|
Slightly tune 16x16 forward DCT
Use an array of __m256i's to store temporary value, essentially letting
the compiler enforce alignment and use aligned loads and stores.
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
c0cc0e8a75
|
Optimize 16x16 multiply by only slicing right mat once
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
e463d27f22
|
Implement streamlined generic 16x16 matrix multiply
It can't be this fast for real, can it?
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
beb85ce9d6
|
Reorder parameters for 8x8 matrix multiplies
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
292af62256
|
Implement tailored 16x16 forward DCT
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
30ce461d98
|
Redo 4x4 matrix multiplication
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
07970ea82f
|
Streamline by-the-book 8x8 matrix multiplication
Also chop up the forward transform into two tailored multiply functions
|
2019-10-28 16:19:42 +02:00 |
|
Pauli Oikkonen
|
7ec7ab3361
|
Implement a tailored AVX2 8x8 DCT
|
2019-10-28 16:19:42 +02:00 |
|
Joose Sainio
|
372934c7db
|
Fix division by zero
|
2019-10-10 16:35:56 +03:00 |
|
Joose Sainio
|
9bdfdeaf5c
|
Rest of the owl
|
2019-10-09 15:48:58 +03:00 |
|
Joose Sainio
|
1ba8525faf
|
WIP
|
2019-10-09 10:35:07 +03:00 |
|
Joose Sainio
|
19496d2692
|
?
|
2019-10-03 14:50:11 +03:00 |
|
Joose Sainio
|
4b111e339e
|
fix couple of bugs in the implementation, bit calculation seems still bit off
|
2019-10-01 15:08:39 +03:00 |
|
Joose Sainio
|
84615e406a
|
fix compiler warnings
|
2019-09-27 14:20:08 +03:00 |
|