Commit graph

2557 commits

Author SHA1 Message Date
Arttu Ylä-Outinen 6653f06dd0 Only compute GOP layer weights when RC is enabled 2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen c8fff1e0d6 Use a larger number of bits for POC lsb when needed
Changes the number of bits used for coding the least significant bits of
the POC based on the GOP size.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen d757a832c2 Change GOP QP offset handling to match HM
Adds fields qp_model_scale and qp_model_offset to kvz_gop_config and
intra_qp_offset to kvz_config.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen f37dcd5879 Move GOP definition to a separate file
Moves definition of the 8-GOP from cfg.c to gop.h.
2020-02-15 22:36:55 +02:00
Ari Lemmetti 6e1007a3e7 Get rid of LAMBA! (Commit #3000) 2020-02-15 22:32:52 +02:00
Ari Lemmetti 0c02e71b43 Remove minor error from readme 2020-02-15 22:29:08 +02:00
Ari Lemmetti 9a0236bb4e Add option 'zero-coeff-rdo' 2020-02-04 21:26:29 +02:00
Ari Lemmetti 886ff36d12 Initial implementation of fast bipred. 2020-02-04 15:46:23 +02:00
Ari Lemmetti 3c7dd0752f Remove the broken "no mov" branch.
Causes hash mismatches for example in SlideShow sequence.
2020-02-03 15:26:31 +02:00
RLamm bf8941ddb8 Added comment about partial-coding usage 2020-01-31 16:19:48 +02:00
RLamm b8488ab48d Changed "partial-coding" variables to uint32_t 2020-01-31 16:02:29 +02:00
RLamm 76e3249754 Changed parameter "slicer" to "partial-coding" to avoid confusion. 2020-01-31 14:22:32 +02:00
RLamm 30d5df40c5 Custom headers for the distributed coding 2020-01-29 15:54:49 +02:00
Pauli Oikkonen c3d9e97e9f Fix VS build 2019-12-12 18:34:55 +02:00
Pauli Oikkonen 7f238ca299 Remove debug print functions
Whoops
2019-12-12 18:19:31 +02:00
Pauli Oikkonen eefb5e50b3 De-inline pred_filtered_dc functions, shouldn't make much difference though 2019-12-12 17:30:00 +02:00
Pauli Oikkonen 169314de4f 32x32 filtered DC prediction in AVX2 2019-12-11 18:17:06 +02:00
Pauli Oikkonen fb2481b7e4 16x16 filtered DC implemented in AVX2 2019-12-10 15:54:50 +02:00
Pauli Oikkonen da370ea36d Implement AVX2 8x8 filtered DC algorithm 2019-11-28 14:10:10 +02:00
Pauli Oikkonen 5d9b7019ca Implement a 4x4 filtered DC pred function 2019-11-26 17:05:54 +02:00
Pauli Oikkonen f1485ab087 Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes? 2019-11-25 15:20:29 +02:00
Pauli Oikkonen 979d66031c Create a strategy out of intra_pred_filtered_dc 2019-11-19 14:50:31 +02:00
Pauli Oikkonen fa4bb86406 Optimize intra_pred_planar_avx2 for 4x4 blocks 2019-11-19 13:39:02 +02:00
Pauli Oikkonen 4761d228f9 Start to vectorize the 4x4 loop 2019-11-15 17:32:40 +02:00
Pauli Oikkonen 8d45ab4951 Stupidify the 4x4 planar loop for vectorization 2019-11-14 17:14:04 +02:00
Pauli Oikkonen 6f13f6525c Merge branch 'new_prints' 2019-11-07 17:04:21 +02:00
mercat 57e8c3ebc2 Merge branch 'ML-cplx_red_ICIP' 2019-11-07 13:25:47 +02:00
Pauli Oikkonen 558f0ec401 Mbps, not mbps 2019-11-05 18:06:00 +02:00
Pauli Oikkonen 2edf533925 Tidy the end report printing
Also fix a bug with non-integer target FPS
2019-11-05 17:20:00 +02:00
Pauli Oikkonen c7313ce567 Store AVG QP information in encmain 2019-11-04 17:08:07 +02:00
Reima Hyvönen 80575c59bf Some updates done to get right bitrate and avg QP 2019-10-31 15:56:24 +02:00
Reima Hyvönen 252bab8820 Added prints to bitrate and AVG QP 2019-10-31 15:56:24 +02:00
Pauli Oikkonen 6d7a4f555c Also remove 16x16 (A * B^T)^T matrix multiply
Can be done using (B * A^T) instead, it's the exact same
2019-10-28 16:19:42 +02:00
Pauli Oikkonen 2c2deb2366 Tidy AVX2 32x32 matrix multiply 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 98ad78b333 Tidy the old AVX2 32x32 matrix multiply
It was actually a very good algorithm, just looked messy!
2019-10-28 16:19:42 +02:00
Pauli Oikkonen 4a921cbdb5 Retain data as much in YMM registers as possible
This seems to make it a whole lot quicker
2019-10-28 16:19:42 +02:00
Pauli Oikkonen ac4d710e23 Unroll 32x32 matrix multiply, use all regs 2019-10-28 16:19:42 +02:00
Pauli Oikkonen a58608d0b8 Remove totally unnecessary (A * B^T)^T 32x32 multiply 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 043f53539f Implement a streamlined matrix-multiply 32x32 DCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen e9da2d851b Tidy 32x32 fast DCT's helper functions 2019-10-28 16:19:42 +02:00
Pauli Oikkonen e382339182 Implement fast (butterfly) 32x32 DCT in AVX2 2019-10-28 16:19:42 +02:00
Pauli Oikkonen b5962dadac Tidy indentation in AVX2 16x16 iDCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 36a8f89025 Fine-tune 16x16 AVX2 iDCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen ca9409de2b Implement 16x16 DCT as butterfly algorithm in AVX2 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 7c69a26717 Use aligned loads and stores for AVX2 DCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 8e9c65dca6 Align DCT matrices and temp transform buffers 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 148a150522 Align DCT source and dest blocks to cache line 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 8e60bbf6a6 Slightly tune 16x16 forward DCT
Use an array of __m256i's to store temporary value, essentially letting
the compiler enforce alignment and use aligned loads and stores.
2019-10-28 16:19:42 +02:00
Pauli Oikkonen c0cc0e8a75 Optimize 16x16 multiply by only slicing right mat once 2019-10-28 16:19:42 +02:00
Pauli Oikkonen e463d27f22 Implement streamlined generic 16x16 matrix multiply
It can't be this fast for real, can it?
2019-10-28 16:19:42 +02:00