Pauli Oikkonen
|
314f5b0e1f
|
Rename 16x2b cmpgt function, comment it better, optimize it slightly
Eliminate an unnecessary bit masking to make it even more messy
|
2019-02-04 14:44:32 +02:00 |
|
Pauli Oikkonen
|
d8ff6a6459
|
Fix _andn_u32 to work on old Visual Studio
|
2019-02-01 15:34:42 +02:00 |
|
Pauli Oikkonen
|
45ac6e6d03
|
Tidy pack_16x16b_to_16x2b comments
|
2019-01-03 16:37:05 +02:00 |
|
Pauli Oikkonen
|
016eb014ad
|
Move packing 16x16b -> 16x2b into separate function
|
2018-12-20 10:51:44 +02:00 |
|
Pauli Oikkonen
|
9aaa6f260d
|
Fixes to enable portability
|
2018-12-18 20:42:09 +02:00 |
|
Pauli Oikkonen
|
2fdbbe9730
|
Move CG reordering code from quant-avx2 to shared header
|
2018-12-18 19:42:18 +02:00 |
|
Pauli Oikkonen
|
d02207306d
|
Create a header file for shared AVX2 code
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
361bf0c7db
|
Precompute >=2 coeff encoding loop with 2-bit arithmetic
Who needs 16x16b vectors when you can do practically the same with
16x2b pseudovectors in 32-bit general purpose registers!
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
f66cb23d5b
|
Optimize greater1 encoding loop
Calculating the c1 variable need not be a serial operation!
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
8c8b791c35
|
Vectorize kvz_context_get_sig_ctx_inc
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
033261eb74
|
Eliminate two branches using bit magic
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
c4434e8d04
|
Scan CG's in forward order to simplify finding last significant
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
efd097f5a5
|
Vectorize the coeff group loop to some extent
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
a01362e638
|
use the efficient method of reordering raster->scan
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
50a888e789
|
Use the efficient method to find first and last nz coeffs in block
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
7e9203f566
|
Scan coeff groups in scan order to help find last significant one
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
9a5a6fdbc7
|
Simplify two ifs in encode_coeff_nxn-avx2
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
37a2a8bac8
|
See if loop can be optimized by rearranging
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
584f2f74b6
|
Vectorize significant coeff group scanning loop
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
1bfed73221
|
Add AVX2 strategy for encode_coding_tree
|
2018-12-18 19:41:09 +02:00 |
|
Reima Hyvönen
|
1fcc5c6a8d
|
Merge branch 'bipred_recon'
|
2018-12-11 09:59:35 +02:00 |
|
Reima Hyvönen
|
e4a10880f3
|
Added case 12 to bipred_recon no mov
|
2018-12-11 09:52:17 +02:00 |
|
Marko Viitanen
|
a4f3968e52
|
Fix Visual Studio errors by initializing some variables used in AVX2 signhiding
|
2018-12-11 09:33:26 +02:00 |
|
Pauli Oikkonen
|
c465578048
|
Add a descriptive comment to coefficient reordering
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
f78bf2ebcb
|
Optimize q_coefs usage for indexed fetch
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
d9591f1b49
|
Eliminate midway buffering of reordered coefs
TODO: For some mysterious reason seems slightly slower than the
buffered one
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
7fe454c51f
|
Optimize get_cheapest_alternative()
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
6bbd3e5a44
|
Optimize rearrange_512 function
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
cb8209d1b3
|
Vectorize transform coefficient reordering loop
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
7cf4c7ae5f
|
Rename "reduce" functions to hsum
That's what the functions fundamendally do anyway
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
316cd8a846
|
Fix ALIGNED keyword and grow alignment to 64B
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
1befc69a4c
|
Implement sign bit hiding in AVX2
|
2018-12-03 15:36:32 +02:00 |
|
Reima Hyvönen
|
f8696b54a4
|
Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)
|
2018-11-20 17:09:19 +02:00 |
|
Reima Hyvönen
|
710ba288db
|
Chroma has some problems
|
2018-11-15 16:42:48 +02:00 |
|
Ari Lemmetti
|
a832206bb6
|
Replace 32-bit incompatible instrinsics
|
2018-11-12 18:54:33 +02:00 |
|
Ari Lemmetti
|
5c774c4105
|
Rewrite most of FME and interpolation filters
Changes had to break a lot of stuff and were just squashed into this horrible code dump
|
2018-11-08 20:21:16 +02:00 |
|
Reima Hyvönen
|
7406c33a42
|
Some more cleaning
|
2018-10-26 12:25:18 +03:00 |
|
Reima Hyvönen
|
4c71546b2e
|
Cleaned some coding
|
2018-10-26 12:19:44 +03:00 |
|
Reima Hyvönen
|
4fe3909e48
|
Switched luma to use 32bits size ints intstead of 16bit size
|
2018-10-24 18:24:46 +03:00 |
|
Reima Hyvönen
|
381e786e10
|
Trying to find the bug in luma
|
2018-10-11 18:08:41 +03:00 |
|
Reima Hyvönen
|
2f5f81bac3
|
removed the non-optimated bipred function
|
2018-10-09 11:19:23 +03:00 |
|
Reima Hyvönen
|
212a8e68fa
|
Modified to avoid memory overflow, still some bug inside luma
|
2018-10-02 20:23:32 +03:00 |
|
Reima Hyvönen
|
896034b7cf
|
Some renamed functions back
|
2018-08-28 15:31:10 +03:00 |
|
Reima Hyvönen
|
7de5c74434
|
Updated bipred_recon to work faster
|
2018-08-28 15:12:31 +03:00 |
|
Reima Hyvönen
|
2ca99a44e8
|
Updated shuffle operation to be in right order
|
2018-08-27 18:16:38 +03:00 |
|
Reima Hyvönen
|
508b218a12
|
some modifications made to prevent reading too much
|
2018-08-14 10:50:39 +03:00 |
|
Reima Hyvönen
|
1d935ee888
|
some useless stuff removed
|
2018-08-13 16:47:11 +03:00 |
|
Reima Hyvönen
|
ce3ac4c05e
|
some modifications to no_mov
|
2018-08-13 16:41:02 +03:00 |
|
Reima Hyvönen
|
15a613ae94
|
test if no_mov breaks testing
|
2018-08-13 16:02:56 +03:00 |
|
Reima Hyvönen
|
97a2049e58
|
removed pointer declaration out from switch
|
2018-08-10 16:42:26 +03:00 |
|