Commit graph

3082 commits

Author SHA1 Message Date
Pauli Oikkonen 314f5b0e1f Rename 16x2b cmpgt function, comment it better, optimize it slightly
Eliminate an unnecessary bit masking to make it even more messy
2019-02-04 14:44:32 +02:00
Pauli Oikkonen d8ff6a6459 Fix _andn_u32 to work on old Visual Studio 2019-02-01 15:34:42 +02:00
Pauli Oikkonen bed93fb7f5 Merge branch 'sad-avx2' 2019-01-10 17:48:09 +02:00
Pauli Oikkonen 26e1b2c783 Use (u)int32_t instead of (unsigned) int in reg_sad_sse41 2019-01-10 14:37:04 +02:00
Pauli Oikkonen 3a1f2eb752 Prefer SSE4.1 implementation of SAD over AVX2
It seems that the 128-bit wide version consistently outperforms the
256-bit one
2019-01-10 13:48:55 +02:00
Pauli Oikkonen 9b24d81c6a Use SSE instead of AVX for small widths
Highly dubious if this will help performance at all
2019-01-07 20:12:13 +02:00
Pauli Oikkonen b2176bf72a Optimize SSE4.1 version of SAD
Make it use the same vblend trick as AVX2. Interestingly, on my test
setup this seems to be faster than the same code using 256-bit AVX
vectors.
2019-01-07 19:40:57 +02:00
Pauli Oikkonen 887d7700a8 Modify AVX2 SAD to mask data by byte granularity in AVX registers
Avoids using any SAD calculations narrower than 256 bits, and
simplifies the code. Also improves execution speed
2019-01-07 18:53:15 +02:00
Pauli Oikkonen 7585f79a71 AVX2-ize SAD calculation
Performance is no better than SSE though
2019-01-07 16:26:24 +02:00
Pauli Oikkonen ab3dc58df6 Copy SAD SSE4.1 impl to AVX2 2019-01-03 18:31:57 +02:00
Pauli Oikkonen 45ac6e6d03 Tidy pack_16x16b_to_16x2b comments 2019-01-03 16:37:05 +02:00
Ari Lemmetti cd818db724 Add missing quantization and residual in cost calculation (inter rd=2). 2018-12-21 15:55:29 +02:00
Pauli Oikkonen 016eb014ad Move packing 16x16b -> 16x2b into separate function 2018-12-20 10:51:44 +02:00
Ari Lemmetti b234897e8a Fix smp and amp blocks in fme and revert previous change.
Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc.
Calculate SATD on the 8x4, ... part
2018-12-19 21:30:53 +02:00
Pauli Oikkonen 3b635309a1 Add new files to Visual Studio project 2018-12-18 20:48:41 +02:00
Pauli Oikkonen 9aaa6f260d Fixes to enable portability 2018-12-18 20:42:09 +02:00
Pauli Oikkonen 2fdbbe9730 Move CG reordering code from quant-avx2 to shared header 2018-12-18 19:42:18 +02:00
Pauli Oikkonen d02207306d Create a header file for shared AVX2 code 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 361bf0c7db Precompute >=2 coeff encoding loop with 2-bit arithmetic
Who needs 16x16b vectors when you can do practically the same with
16x2b pseudovectors in 32-bit general purpose registers!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen 940b0e9e6a Require BMI2 for AVX2 build
Any processor implementing AVX2 should also implement BMI2
2018-12-18 19:41:09 +02:00
Pauli Oikkonen f66cb23d5b Optimize greater1 encoding loop
Calculating the c1 variable need not be a serial operation!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen 8c8b791c35 Vectorize kvz_context_get_sig_ctx_inc 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 033261eb74 Eliminate two branches using bit magic 2018-12-18 19:41:09 +02:00
Pauli Oikkonen c4434e8d04 Scan CG's in forward order to simplify finding last significant 2018-12-18 19:41:09 +02:00
Pauli Oikkonen efd097f5a5 Vectorize the coeff group loop to some extent 2018-12-18 19:41:09 +02:00
Pauli Oikkonen a01362e638 use the efficient method of reordering raster->scan 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 50a888e789 Use the efficient method to find first and last nz coeffs in block 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 7e9203f566 Scan coeff groups in scan order to help find last significant one 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 9a5a6fdbc7 Simplify two ifs in encode_coeff_nxn-avx2 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 37a2a8bac8 See if loop can be optimized by rearranging 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 584f2f74b6 Vectorize significant coeff group scanning loop 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 1bfed73221 Add AVX2 strategy for encode_coding_tree 2018-12-18 19:41:09 +02:00
Pauli Oikkonen c3a6f3112a Add generic strategy group for encode_coding_tree 2018-12-18 19:41:09 +02:00
Marko Viitanen 1ef851ab4b Disable FME on amp/smp blocks with width or height not divisible by 8 2018-12-18 10:28:21 +02:00
Joose Sainio b71c5573f0 Merge branch 'rate_control_fix' 2018-12-17 12:39:27 +02:00
Marko Viitanen fd2dc57759
Merge pull request #219 from trofi/master
x86 asm: mark stack as non-executable
2018-12-17 10:25:35 +02:00
Sergei Trofimovich 68a70e45a1 x86 asm: mark stack as non-executable
Gentoo's `scanelf` QA tool detects writable/executable stack
of assembly-writtent files as:

```
$ scanelf -qRa  .
 0644 LE !WX --- ---     ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o
 0644 LE !WX --- ---     ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o
 0644 LE !WX --- ---     ./src/strategies/x86_asm/picture-x86-asm-sad.o
 0644 LE !WX --- ---     ./src/strategies/x86_asm/picture-x86-asm-satd.o
```

Normally C compiler emits non-executable stack marking (or GNU assembler
via `-Wa,--noexecstack`).

The change adds non-executable stack marking for yasm-based assmbly files.

https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2018-12-16 11:31:56 +00:00
Reima Hyvönen 83caffe5d9 Reverted project files back to VS 2013 2018-12-11 10:06:35 +02:00
Reima Hyvönen 1fcc5c6a8d Merge branch 'bipred_recon' 2018-12-11 09:59:35 +02:00
Reima Hyvönen e4a10880f3 Added case 12 to bipred_recon no mov 2018-12-11 09:52:17 +02:00
Marko Viitanen a4f3968e52 Fix Visual Studio errors by initializing some variables used in AVX2 signhiding 2018-12-11 09:33:26 +02:00
Ari Lemmetti ac943147e3 Calculate satd cost for whole non-square blocks as well. 2018-12-10 17:04:29 +02:00
Pauli Oikkonen c2906de114 Merge branch 'sign-hiding-avx2' into 'master'
Sign hiding avx2

See merge request TIE/ultravideo/kvazaar!2
2018-12-10 14:24:40 +02:00
Pauli Oikkonen b6b89672fd Include Vim swap files in .gitignore 2018-12-03 15:38:52 +02:00
Pauli Oikkonen c465578048 Add a descriptive comment to coefficient reordering 2018-12-03 15:36:32 +02:00
Pauli Oikkonen f78bf2ebcb Optimize q_coefs usage for indexed fetch 2018-12-03 15:36:32 +02:00
Pauli Oikkonen d9591f1b49 Eliminate midway buffering of reordered coefs
TODO: For some mysterious reason seems slightly slower than the
buffered one
2018-12-03 15:36:32 +02:00
Pauli Oikkonen 7fe454c51f Optimize get_cheapest_alternative() 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 6bbd3e5a44 Optimize rearrange_512 function 2018-12-03 15:36:32 +02:00
Pauli Oikkonen cb8209d1b3 Vectorize transform coefficient reordering loop 2018-12-03 15:36:32 +02:00