Pauli Oikkonen
|
bed93fb7f5
|
Merge branch 'sad-avx2'
|
2019-01-10 17:48:09 +02:00 |
|
Pauli Oikkonen
|
26e1b2c783
|
Use (u)int32_t instead of (unsigned) int in reg_sad_sse41
|
2019-01-10 14:37:04 +02:00 |
|
Pauli Oikkonen
|
3a1f2eb752
|
Prefer SSE4.1 implementation of SAD over AVX2
It seems that the 128-bit wide version consistently outperforms the
256-bit one
|
2019-01-10 13:48:55 +02:00 |
|
Pauli Oikkonen
|
9b24d81c6a
|
Use SSE instead of AVX for small widths
Highly dubious if this will help performance at all
|
2019-01-07 20:12:13 +02:00 |
|
Pauli Oikkonen
|
b2176bf72a
|
Optimize SSE4.1 version of SAD
Make it use the same vblend trick as AVX2. Interestingly, on my test
setup this seems to be faster than the same code using 256-bit AVX
vectors.
|
2019-01-07 19:40:57 +02:00 |
|
Pauli Oikkonen
|
887d7700a8
|
Modify AVX2 SAD to mask data by byte granularity in AVX registers
Avoids using any SAD calculations narrower than 256 bits, and
simplifies the code. Also improves execution speed
|
2019-01-07 18:53:15 +02:00 |
|
Pauli Oikkonen
|
7585f79a71
|
AVX2-ize SAD calculation
Performance is no better than SSE though
|
2019-01-07 16:26:24 +02:00 |
|
Pauli Oikkonen
|
ab3dc58df6
|
Copy SAD SSE4.1 impl to AVX2
|
2019-01-03 18:31:57 +02:00 |
|
Pauli Oikkonen
|
45ac6e6d03
|
Tidy pack_16x16b_to_16x2b comments
|
2019-01-03 16:37:05 +02:00 |
|
Ari Lemmetti
|
cd818db724
|
Add missing quantization and residual in cost calculation (inter rd=2).
|
2018-12-21 15:55:29 +02:00 |
|
Pauli Oikkonen
|
016eb014ad
|
Move packing 16x16b -> 16x2b into separate function
|
2018-12-20 10:51:44 +02:00 |
|
Ari Lemmetti
|
b234897e8a
|
Fix smp and amp blocks in fme and revert previous change.
Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc.
Calculate SATD on the 8x4, ... part
|
2018-12-19 21:30:53 +02:00 |
|
Pauli Oikkonen
|
3b635309a1
|
Add new files to Visual Studio project
|
2018-12-18 20:48:41 +02:00 |
|
Pauli Oikkonen
|
9aaa6f260d
|
Fixes to enable portability
|
2018-12-18 20:42:09 +02:00 |
|
Pauli Oikkonen
|
2fdbbe9730
|
Move CG reordering code from quant-avx2 to shared header
|
2018-12-18 19:42:18 +02:00 |
|
Pauli Oikkonen
|
d02207306d
|
Create a header file for shared AVX2 code
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
361bf0c7db
|
Precompute >=2 coeff encoding loop with 2-bit arithmetic
Who needs 16x16b vectors when you can do practically the same with
16x2b pseudovectors in 32-bit general purpose registers!
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
940b0e9e6a
|
Require BMI2 for AVX2 build
Any processor implementing AVX2 should also implement BMI2
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
f66cb23d5b
|
Optimize greater1 encoding loop
Calculating the c1 variable need not be a serial operation!
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
8c8b791c35
|
Vectorize kvz_context_get_sig_ctx_inc
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
033261eb74
|
Eliminate two branches using bit magic
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
c4434e8d04
|
Scan CG's in forward order to simplify finding last significant
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
efd097f5a5
|
Vectorize the coeff group loop to some extent
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
a01362e638
|
use the efficient method of reordering raster->scan
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
50a888e789
|
Use the efficient method to find first and last nz coeffs in block
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
7e9203f566
|
Scan coeff groups in scan order to help find last significant one
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
9a5a6fdbc7
|
Simplify two ifs in encode_coeff_nxn-avx2
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
37a2a8bac8
|
See if loop can be optimized by rearranging
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
584f2f74b6
|
Vectorize significant coeff group scanning loop
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
1bfed73221
|
Add AVX2 strategy for encode_coding_tree
|
2018-12-18 19:41:09 +02:00 |
|
Pauli Oikkonen
|
c3a6f3112a
|
Add generic strategy group for encode_coding_tree
|
2018-12-18 19:41:09 +02:00 |
|
Marko Viitanen
|
1ef851ab4b
|
Disable FME on amp/smp blocks with width or height not divisible by 8
|
2018-12-18 10:28:21 +02:00 |
|
Joose Sainio
|
b71c5573f0
|
Merge branch 'rate_control_fix'
|
2018-12-17 12:39:27 +02:00 |
|
Marko Viitanen
|
fd2dc57759
|
Merge pull request #219 from trofi/master
x86 asm: mark stack as non-executable
|
2018-12-17 10:25:35 +02:00 |
|
Sergei Trofimovich
|
68a70e45a1
|
x86 asm: mark stack as non-executable
Gentoo's `scanelf` QA tool detects writable/executable stack
of assembly-writtent files as:
```
$ scanelf -qRa .
0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o
0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o
0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-sad.o
0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-satd.o
```
Normally C compiler emits non-executable stack marking (or GNU assembler
via `-Wa,--noexecstack`).
The change adds non-executable stack marking for yasm-based assmbly files.
https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
|
2018-12-16 11:31:56 +00:00 |
|
Reima Hyvönen
|
83caffe5d9
|
Reverted project files back to VS 2013
|
2018-12-11 10:06:35 +02:00 |
|
Reima Hyvönen
|
1fcc5c6a8d
|
Merge branch 'bipred_recon'
|
2018-12-11 09:59:35 +02:00 |
|
Reima Hyvönen
|
e4a10880f3
|
Added case 12 to bipred_recon no mov
|
2018-12-11 09:52:17 +02:00 |
|
Marko Viitanen
|
a4f3968e52
|
Fix Visual Studio errors by initializing some variables used in AVX2 signhiding
|
2018-12-11 09:33:26 +02:00 |
|
Ari Lemmetti
|
ac943147e3
|
Calculate satd cost for whole non-square blocks as well.
|
2018-12-10 17:04:29 +02:00 |
|
Pauli Oikkonen
|
c2906de114
|
Merge branch 'sign-hiding-avx2' into 'master'
Sign hiding avx2
See merge request TIE/ultravideo/kvazaar!2
|
2018-12-10 14:24:40 +02:00 |
|
Pauli Oikkonen
|
b6b89672fd
|
Include Vim swap files in .gitignore
|
2018-12-03 15:38:52 +02:00 |
|
Pauli Oikkonen
|
c465578048
|
Add a descriptive comment to coefficient reordering
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
f78bf2ebcb
|
Optimize q_coefs usage for indexed fetch
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
d9591f1b49
|
Eliminate midway buffering of reordered coefs
TODO: For some mysterious reason seems slightly slower than the
buffered one
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
7fe454c51f
|
Optimize get_cheapest_alternative()
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
6bbd3e5a44
|
Optimize rearrange_512 function
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
cb8209d1b3
|
Vectorize transform coefficient reordering loop
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
7cf4c7ae5f
|
Rename "reduce" functions to hsum
That's what the functions fundamendally do anyway
|
2018-12-03 15:36:32 +02:00 |
|
Pauli Oikkonen
|
316cd8a846
|
Fix ALIGNED keyword and grow alignment to 64B
|
2018-12-03 15:36:32 +02:00 |
|