Commit graph

2375 commits

Author SHA1 Message Date
Pauli Oikkonen 9aaa6f260d Fixes to enable portability 2018-12-18 20:42:09 +02:00
Pauli Oikkonen 2fdbbe9730 Move CG reordering code from quant-avx2 to shared header 2018-12-18 19:42:18 +02:00
Pauli Oikkonen d02207306d Create a header file for shared AVX2 code 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 361bf0c7db Precompute >=2 coeff encoding loop with 2-bit arithmetic
Who needs 16x16b vectors when you can do practically the same with
16x2b pseudovectors in 32-bit general purpose registers!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen 940b0e9e6a Require BMI2 for AVX2 build
Any processor implementing AVX2 should also implement BMI2
2018-12-18 19:41:09 +02:00
Pauli Oikkonen f66cb23d5b Optimize greater1 encoding loop
Calculating the c1 variable need not be a serial operation!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen 8c8b791c35 Vectorize kvz_context_get_sig_ctx_inc 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 033261eb74 Eliminate two branches using bit magic 2018-12-18 19:41:09 +02:00
Pauli Oikkonen c4434e8d04 Scan CG's in forward order to simplify finding last significant 2018-12-18 19:41:09 +02:00
Pauli Oikkonen efd097f5a5 Vectorize the coeff group loop to some extent 2018-12-18 19:41:09 +02:00
Pauli Oikkonen a01362e638 use the efficient method of reordering raster->scan 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 50a888e789 Use the efficient method to find first and last nz coeffs in block 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 7e9203f566 Scan coeff groups in scan order to help find last significant one 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 9a5a6fdbc7 Simplify two ifs in encode_coeff_nxn-avx2 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 37a2a8bac8 See if loop can be optimized by rearranging 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 584f2f74b6 Vectorize significant coeff group scanning loop 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 1bfed73221 Add AVX2 strategy for encode_coding_tree 2018-12-18 19:41:09 +02:00
Pauli Oikkonen c3a6f3112a Add generic strategy group for encode_coding_tree 2018-12-18 19:41:09 +02:00
Marko Viitanen 1ef851ab4b Disable FME on amp/smp blocks with width or height not divisible by 8 2018-12-18 10:28:21 +02:00
Joose Sainio b71c5573f0 Merge branch 'rate_control_fix' 2018-12-17 12:39:27 +02:00
Sergei Trofimovich 68a70e45a1 x86 asm: mark stack as non-executable
Gentoo's `scanelf` QA tool detects writable/executable stack
of assembly-writtent files as:

```
$ scanelf -qRa  .
 0644 LE !WX --- ---     ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o
 0644 LE !WX --- ---     ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o
 0644 LE !WX --- ---     ./src/strategies/x86_asm/picture-x86-asm-sad.o
 0644 LE !WX --- ---     ./src/strategies/x86_asm/picture-x86-asm-satd.o
```

Normally C compiler emits non-executable stack marking (or GNU assembler
via `-Wa,--noexecstack`).

The change adds non-executable stack marking for yasm-based assmbly files.

https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2018-12-16 11:31:56 +00:00
Reima Hyvönen 1fcc5c6a8d Merge branch 'bipred_recon' 2018-12-11 09:59:35 +02:00
Reima Hyvönen e4a10880f3 Added case 12 to bipred_recon no mov 2018-12-11 09:52:17 +02:00
Marko Viitanen a4f3968e52 Fix Visual Studio errors by initializing some variables used in AVX2 signhiding 2018-12-11 09:33:26 +02:00
Ari Lemmetti ac943147e3 Calculate satd cost for whole non-square blocks as well. 2018-12-10 17:04:29 +02:00
Pauli Oikkonen c465578048 Add a descriptive comment to coefficient reordering 2018-12-03 15:36:32 +02:00
Pauli Oikkonen f78bf2ebcb Optimize q_coefs usage for indexed fetch 2018-12-03 15:36:32 +02:00
Pauli Oikkonen d9591f1b49 Eliminate midway buffering of reordered coefs
TODO: For some mysterious reason seems slightly slower than the
buffered one
2018-12-03 15:36:32 +02:00
Pauli Oikkonen 7fe454c51f Optimize get_cheapest_alternative() 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 6bbd3e5a44 Optimize rearrange_512 function 2018-12-03 15:36:32 +02:00
Pauli Oikkonen cb8209d1b3 Vectorize transform coefficient reordering loop 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 7cf4c7ae5f Rename "reduce" functions to hsum
That's what the functions fundamendally do anyway
2018-12-03 15:36:32 +02:00
Pauli Oikkonen 316cd8a846 Fix ALIGNED keyword and grow alignment to 64B 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 1befc69a4c Implement sign bit hiding in AVX2 2018-12-03 15:36:32 +02:00
Pauli Oikkonen c5cd03497e Require BMI and ABM instruction sets for AVX2 build
AVX2 support on a processor should always imply BMI and ABM support.
The lzcnt and tzcnt instructions have more suitable semantics in the
corner case that source word is 0, and allow us to even handle that
scenario without a branch. Apparently Visual Studio will already
include this support when building with AVX2 enabled, so only the
automake files need to be tweaked.
2018-12-03 15:36:32 +02:00
Reima Hyvönen f8696b54a4 Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12) 2018-11-20 17:09:19 +02:00
Marko Viitanen a5a10a33c3 Enable --scaling-list parameter and add to the documentation 2018-11-19 10:47:30 +02:00
Reima Hyvönen 710ba288db Chroma has some problems 2018-11-15 16:42:48 +02:00
Sami Ahovainio 8f98d4aac7 Added square search 2018-11-14 14:50:31 +02:00
Marko Viitanen 6871490dd5 Simplify get_mvd_coding_cost(), only include golomb coding 2018-11-14 14:33:31 +02:00
Ari Lemmetti a832206bb6 Replace 32-bit incompatible instrinsics 2018-11-12 18:54:33 +02:00
Ari Lemmetti 5c774c4105 Rewrite most of FME and interpolation filters
Changes had to break a lot of stuff and were just squashed into this horrible code dump
2018-11-08 20:21:16 +02:00
Joose Sainio 1c8a1f24e2 Don't assume anything about bits spent 2018-11-07 16:03:38 +02:00
Joose Sainio 3471e2470d Fix using uninitialized value for the first frame 2018-11-07 08:17:39 +02:00
Joose Sainio d95ac11a3b Fix rate_control for other LP-GOPS 2018-11-06 14:20:44 +02:00
Joose Sainio 67a6ba667e Fix rate control for flat lp-gop 2018-11-06 09:38:17 +02:00
Reima Hyvönen 7406c33a42 Some more cleaning 2018-10-26 12:25:18 +03:00
Reima Hyvönen 4c71546b2e Cleaned some coding 2018-10-26 12:19:44 +03:00
Reima Hyvönen 4fe3909e48 Switched luma to use 32bits size ints intstead of 16bit size 2018-10-24 18:24:46 +03:00
Eemeli Kallio 284e73839e Calculating zero cost moved to its own function 2018-10-16 11:02:01 +03:00
Reima Hyvönen 381e786e10 Trying to find the bug in luma 2018-10-11 18:08:41 +03:00
Marko Viitanen c589e5ed36 Fix closed-gop frame feed, the ordering was incorrect after the first GOP 2018-10-10 11:12:03 +03:00
Reima Hyvönen 2f5f81bac3 removed the non-optimated bipred function 2018-10-09 11:19:23 +03:00
Marko Viitanen 75dce4f3ce Fix low-delay-gop usage with --no-open-gop 2018-10-04 15:16:02 +03:00
Marko Viitanen de71b58f76 Change closed GOP structure to include an additional IDR between GOPs 2018-10-04 11:17:03 +03:00
Reima Hyvönen 212a8e68fa Modified to avoid memory overflow, still some bug inside luma 2018-10-02 20:23:32 +03:00
Marko Viitanen 954f07e3d7 Add --(no-)open-gop option 2018-10-02 10:05:32 +03:00
Marko Viitanen 8bef85e056 Merge branch 'set-qp-in-cu' 2018-09-03 08:33:33 +03:00
Ari Lemmetti 2fdcc2b79d Add option --set-qp-in-cu 2018-09-03 08:32:45 +03:00
Reima Hyvönen 896034b7cf Some renamed functions back 2018-08-28 15:31:10 +03:00
Reima Hyvönen e8b5e6db4c Did some merging 2018-08-28 15:26:27 +03:00
Reima Hyvönen 7de5c74434 Updated bipred_recon to work faster 2018-08-28 15:12:31 +03:00
Reima Hyvönen 47b357cca2 Comment one test 2018-08-27 18:52:14 +03:00
Reima Hyvönen 2ca99a44e8 Updated shuffle operation to be in right order 2018-08-27 18:16:38 +03:00
Marko Viitanen b85ae3688e Signal QP in slice header if tiles and slices=tiles are enabled
Keeps the PPS constant for various purposes
2018-08-16 08:44:39 +03:00
Reima Hyvönen 508b218a12 some modifications made to prevent reading too much 2018-08-14 10:50:39 +03:00
Reima Hyvönen 1d935ee888 some useless stuff removed 2018-08-13 16:47:11 +03:00
Reima Hyvönen ce3ac4c05e some modifications to no_mov 2018-08-13 16:41:02 +03:00
Reima Hyvönen 15a613ae94 test if no_mov breaks testing 2018-08-13 16:02:56 +03:00
Reima Hyvönen 97a2049e58 removed pointer declaration out from switch 2018-08-10 16:42:26 +03:00
Reima Hyvönen aa94bcedbc Stream is now pointer 2018-08-10 16:38:49 +03:00
Reima Hyvönen fa5b227ece 256 to 32 doesn't work, made them by hand 2018-08-10 16:01:20 +03:00
Reima Hyvönen 408dedbcc8 removed _mm256_extract_epi8 and replaced with _mm_stream 2018-08-10 15:53:26 +03:00
Reima Hyvönen 31c35091c6 _mm256_cvtsi256_si32 removed 2018-08-10 10:06:40 +03:00
Reima Hyvönen 99dc43074f _mm256_cvtsi256_si32 breaks system, too much bits. back to extract 2018-08-10 09:59:33 +03:00
Reima Hyvönen 4f1f80b2cb Transformed convert from 256 to cast 256 -> 128 and then convert from 128 2018-08-09 15:35:54 +03:00
Reima Hyvönen 4957555eb3 Removed leftover from 939 2018-08-09 15:25:03 +03:00
Reima Hyvönen 28b165c971 Clearified some sections, added _MM_SHUFFLE macro 2018-08-09 15:23:01 +03:00
Reima Hyvönen dd04df8667 testing if error in both avx2 functions 2018-08-03 11:49:00 +03:00
Reima Hyvönen ed50d71fde Switched some variables to different location, altered inter_recon_bipred_avx2 function 2018-08-02 16:08:59 +03:00
Reima Hyvönen f5739a0028 Renaming and removing useless prints 2018-08-02 14:47:17 +03:00
Reima Hyvönen bc09f59bb6 Edited some definitions 2018-08-02 11:54:53 +03:00
Arttu Ylä-Outinen 83555c3d6d Enable --fast-residual-cost with fastest presets 2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen c438bb4a19 Add an option to skip CABAC for residual costs
Adds command line option --fast-residual-cost=<limit>. When QP is below
the limit, estimates the cost of coding the residual coefficients from
the sum of absolute coefficients. Skipping CABAC is not worth it with
high QPs because there are fewer coefficients so CABAC is not as slow.
2018-07-16 12:31:20 +03:00
Reima Hyvönen a4bf77f208 Tested some extract functions 2018-07-12 09:29:32 +03:00
Reima Hyvönen c05033a893 Even more useless vectors removed 2018-07-11 15:09:14 +03:00
Reima Hyvönen 884cb77238 Removed some not used vectors 2018-07-11 15:06:11 +03:00
Reima Hyvönen 792689a5ff Removed for-loops, added extract instead 2018-07-11 14:56:41 +03:00
Reima Hyvönen f9c7f6ee66 Added some break-operations for avx2 optimation 2018-07-11 14:15:38 +03:00
Reima Hyvönen cc064da143 some more optimation for bipred 2018-07-11 11:27:54 +03:00
Reima Hyvönen 9a339eef89 Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD
# Conflicts:
#	build/kvazaar_lib/kvazaar_lib.vcxproj
2018-07-10 16:21:04 +03:00
Reima Hyvönen a22cf03ddb Updated to have no movement function to avx2 strategies 2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen b7474eb532 Fix SAO buffer sizes
Increases sizes of buffers used for SAO reconstruction to avoid stack
buffer overflow in AVX2 SAO reconstruction.
2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen b37470e80f
Merge pull request #207 from jbeich/maltivec
Unbreak build on PowerPC if AltiVec isn't supported
2018-07-04 11:06:41 +03:00
Reima Hyvönen ea83ae45f0 Toimiva ratkaisu 2018-07-03 11:18:51 +03:00
Jan Beich 4f4bea7496 Check -maltivec is supported before using
PowerPC target may lack or have non-standard FPU:

$ cc -dumpmachine
powerpcspe-undermydesk-freebsd
$ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c
src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist
2018-07-02 23:25:23 +00:00
Jan Beich b892d820f8 Clean up macOS includes on powerpc* after 93e1c9f1c3
strategyselector.c:426:25: machine/cpu.h: No such file or directory
2018-07-02 21:52:45 +00:00
Reima Hyvönen 17babfffa4 25.6 working optimation, ~50% faster than original 2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen 2f995f4325
Merge pull request #205 from jbeich/powerpc
Unbreak build on non-Linux powerpc*
2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen c1398ef818 Permit --period=1 with any GOP structure
All intra coding is a special case so it can be permitted even though
Kvazaar normally only supports intra periods that are divisible by the
GOP length.
2018-06-18 12:26:11 +03:00