Commit graph

2345 commits

Author SHA1 Message Date
Pauli Oikkonen cb8209d1b3 Vectorize transform coefficient reordering loop 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 7cf4c7ae5f Rename "reduce" functions to hsum
That's what the functions fundamendally do anyway
2018-12-03 15:36:32 +02:00
Pauli Oikkonen 316cd8a846 Fix ALIGNED keyword and grow alignment to 64B 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 1befc69a4c Implement sign bit hiding in AVX2 2018-12-03 15:36:32 +02:00
Pauli Oikkonen c5cd03497e Require BMI and ABM instruction sets for AVX2 build
AVX2 support on a processor should always imply BMI and ABM support.
The lzcnt and tzcnt instructions have more suitable semantics in the
corner case that source word is 0, and allow us to even handle that
scenario without a branch. Apparently Visual Studio will already
include this support when building with AVX2 enabled, so only the
automake files need to be tweaked.
2018-12-03 15:36:32 +02:00
Reima Hyvönen f8696b54a4 Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12) 2018-11-20 17:09:19 +02:00
Marko Viitanen a5a10a33c3 Enable --scaling-list parameter and add to the documentation 2018-11-19 10:47:30 +02:00
Reima Hyvönen 710ba288db Chroma has some problems 2018-11-15 16:42:48 +02:00
Sami Ahovainio 8f98d4aac7 Added square search 2018-11-14 14:50:31 +02:00
Marko Viitanen 6871490dd5 Simplify get_mvd_coding_cost(), only include golomb coding 2018-11-14 14:33:31 +02:00
Ari Lemmetti a832206bb6 Replace 32-bit incompatible instrinsics 2018-11-12 18:54:33 +02:00
Ari Lemmetti 5c774c4105 Rewrite most of FME and interpolation filters
Changes had to break a lot of stuff and were just squashed into this horrible code dump
2018-11-08 20:21:16 +02:00
Joose Sainio 1c8a1f24e2 Don't assume anything about bits spent 2018-11-07 16:03:38 +02:00
Joose Sainio 3471e2470d Fix using uninitialized value for the first frame 2018-11-07 08:17:39 +02:00
Joose Sainio d95ac11a3b Fix rate_control for other LP-GOPS 2018-11-06 14:20:44 +02:00
Joose Sainio 67a6ba667e Fix rate control for flat lp-gop 2018-11-06 09:38:17 +02:00
Reima Hyvönen 7406c33a42 Some more cleaning 2018-10-26 12:25:18 +03:00
Reima Hyvönen 4c71546b2e Cleaned some coding 2018-10-26 12:19:44 +03:00
Reima Hyvönen 4fe3909e48 Switched luma to use 32bits size ints intstead of 16bit size 2018-10-24 18:24:46 +03:00
Eemeli Kallio 284e73839e Calculating zero cost moved to its own function 2018-10-16 11:02:01 +03:00
Reima Hyvönen 381e786e10 Trying to find the bug in luma 2018-10-11 18:08:41 +03:00
Marko Viitanen c589e5ed36 Fix closed-gop frame feed, the ordering was incorrect after the first GOP 2018-10-10 11:12:03 +03:00
Reima Hyvönen 2f5f81bac3 removed the non-optimated bipred function 2018-10-09 11:19:23 +03:00
Marko Viitanen 75dce4f3ce Fix low-delay-gop usage with --no-open-gop 2018-10-04 15:16:02 +03:00
Marko Viitanen de71b58f76 Change closed GOP structure to include an additional IDR between GOPs 2018-10-04 11:17:03 +03:00
Reima Hyvönen 212a8e68fa Modified to avoid memory overflow, still some bug inside luma 2018-10-02 20:23:32 +03:00
Marko Viitanen 954f07e3d7 Add --(no-)open-gop option 2018-10-02 10:05:32 +03:00
Marko Viitanen 8bef85e056 Merge branch 'set-qp-in-cu' 2018-09-03 08:33:33 +03:00
Ari Lemmetti 2fdcc2b79d Add option --set-qp-in-cu 2018-09-03 08:32:45 +03:00
Reima Hyvönen 896034b7cf Some renamed functions back 2018-08-28 15:31:10 +03:00
Reima Hyvönen e8b5e6db4c Did some merging 2018-08-28 15:26:27 +03:00
Reima Hyvönen 7de5c74434 Updated bipred_recon to work faster 2018-08-28 15:12:31 +03:00
Reima Hyvönen 47b357cca2 Comment one test 2018-08-27 18:52:14 +03:00
Reima Hyvönen 2ca99a44e8 Updated shuffle operation to be in right order 2018-08-27 18:16:38 +03:00
Marko Viitanen b85ae3688e Signal QP in slice header if tiles and slices=tiles are enabled
Keeps the PPS constant for various purposes
2018-08-16 08:44:39 +03:00
Reima Hyvönen 508b218a12 some modifications made to prevent reading too much 2018-08-14 10:50:39 +03:00
Reima Hyvönen 1d935ee888 some useless stuff removed 2018-08-13 16:47:11 +03:00
Reima Hyvönen ce3ac4c05e some modifications to no_mov 2018-08-13 16:41:02 +03:00
Reima Hyvönen 15a613ae94 test if no_mov breaks testing 2018-08-13 16:02:56 +03:00
Reima Hyvönen 97a2049e58 removed pointer declaration out from switch 2018-08-10 16:42:26 +03:00
Reima Hyvönen aa94bcedbc Stream is now pointer 2018-08-10 16:38:49 +03:00
Reima Hyvönen fa5b227ece 256 to 32 doesn't work, made them by hand 2018-08-10 16:01:20 +03:00
Reima Hyvönen 408dedbcc8 removed _mm256_extract_epi8 and replaced with _mm_stream 2018-08-10 15:53:26 +03:00
Reima Hyvönen 31c35091c6 _mm256_cvtsi256_si32 removed 2018-08-10 10:06:40 +03:00
Reima Hyvönen 99dc43074f _mm256_cvtsi256_si32 breaks system, too much bits. back to extract 2018-08-10 09:59:33 +03:00
Reima Hyvönen 4f1f80b2cb Transformed convert from 256 to cast 256 -> 128 and then convert from 128 2018-08-09 15:35:54 +03:00
Reima Hyvönen 4957555eb3 Removed leftover from 939 2018-08-09 15:25:03 +03:00
Reima Hyvönen 28b165c971 Clearified some sections, added _MM_SHUFFLE macro 2018-08-09 15:23:01 +03:00
Reima Hyvönen dd04df8667 testing if error in both avx2 functions 2018-08-03 11:49:00 +03:00
Reima Hyvönen ed50d71fde Switched some variables to different location, altered inter_recon_bipred_avx2 function 2018-08-02 16:08:59 +03:00
Reima Hyvönen f5739a0028 Renaming and removing useless prints 2018-08-02 14:47:17 +03:00
Reima Hyvönen bc09f59bb6 Edited some definitions 2018-08-02 11:54:53 +03:00
Arttu Ylä-Outinen 83555c3d6d Enable --fast-residual-cost with fastest presets 2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen c438bb4a19 Add an option to skip CABAC for residual costs
Adds command line option --fast-residual-cost=<limit>. When QP is below
the limit, estimates the cost of coding the residual coefficients from
the sum of absolute coefficients. Skipping CABAC is not worth it with
high QPs because there are fewer coefficients so CABAC is not as slow.
2018-07-16 12:31:20 +03:00
Reima Hyvönen a4bf77f208 Tested some extract functions 2018-07-12 09:29:32 +03:00
Reima Hyvönen c05033a893 Even more useless vectors removed 2018-07-11 15:09:14 +03:00
Reima Hyvönen 884cb77238 Removed some not used vectors 2018-07-11 15:06:11 +03:00
Reima Hyvönen 792689a5ff Removed for-loops, added extract instead 2018-07-11 14:56:41 +03:00
Reima Hyvönen f9c7f6ee66 Added some break-operations for avx2 optimation 2018-07-11 14:15:38 +03:00
Reima Hyvönen cc064da143 some more optimation for bipred 2018-07-11 11:27:54 +03:00
Reima Hyvönen 9a339eef89 Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD
# Conflicts:
#	build/kvazaar_lib/kvazaar_lib.vcxproj
2018-07-10 16:21:04 +03:00
Reima Hyvönen a22cf03ddb Updated to have no movement function to avx2 strategies 2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen b7474eb532 Fix SAO buffer sizes
Increases sizes of buffers used for SAO reconstruction to avoid stack
buffer overflow in AVX2 SAO reconstruction.
2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen b37470e80f
Merge pull request #207 from jbeich/maltivec
Unbreak build on PowerPC if AltiVec isn't supported
2018-07-04 11:06:41 +03:00
Reima Hyvönen ea83ae45f0 Toimiva ratkaisu 2018-07-03 11:18:51 +03:00
Jan Beich 4f4bea7496 Check -maltivec is supported before using
PowerPC target may lack or have non-standard FPU:

$ cc -dumpmachine
powerpcspe-undermydesk-freebsd
$ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c
src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist
2018-07-02 23:25:23 +00:00
Jan Beich b892d820f8 Clean up macOS includes on powerpc* after 93e1c9f1c3
strategyselector.c:426:25: machine/cpu.h: No such file or directory
2018-07-02 21:52:45 +00:00
Reima Hyvönen 17babfffa4 25.6 working optimation, ~50% faster than original 2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen 2f995f4325
Merge pull request #205 from jbeich/powerpc
Unbreak build on non-Linux powerpc*
2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen c1398ef818 Permit --period=1 with any GOP structure
All intra coding is a special case so it can be permitted even though
Kvazaar normally only supports intra periods that are divisible by the
GOP length.
2018-06-18 12:26:11 +03:00
Arttu Ylä-Outinen abdebe0bf9 Fix --owf help message
The number of parallel frames is --owf plus one, not --owf minus one.

Fixes #204.
2018-06-18 09:33:36 +03:00
Jan Beich 93e1c9f1c3 Add AltiVec detection for BSDs
strategyselector.c:377:26: linux/auxvec.h: No such file or directory
2018-06-17 15:38:24 +00:00
Miika Metsoila 98972d26c2 Document that the high tier requires level 4 or higher 2018-06-14 12:41:03 +03:00
Miika Metsoila 62b44efaa4 Write the encoding tier (main/high) into the bitstream 2018-06-14 12:41:03 +03:00
Arttu Ylä-Outinen a343f6d587 Prepare for delta QPs at CU-level
- Replaces lcu_dqp_enabled with max_qp_delta_depth in encoder_control_t.
- Fixes set_cu_qps so that it can handle quantization groups of
  arbitrary size.
- Fixes computation of QP predictors so that it works for quantization
  groups of arbitrary size.
2018-06-13 15:36:19 +03:00
Arttu Ylä-Outinen dc6b2024ea Modify reference count asserts to fix data races
Changes asserts on the reference count of objects to assert the value
after KVZ_ATOMIC_INC instead of directly checking the value. Fixes some
data races detected by TSan.
2018-06-12 09:35:07 +03:00
Ari Lemmetti 4fb1c16c61 Add early termination for intra rdo when a zero coefficient block is found. 2018-06-08 21:03:07 +03:00
Ari Lemmetti 492529fb7a Add the same comment to help message as well... 2018-05-30 14:13:15 +03:00
Ari Lemmetti 0d5972bf03 Add missing sort to intra transform split search so mode at 0 is the best 2018-05-21 13:10:38 +03:00
Sebastien Alaiwan 954bca7d6e Fix memset parameter 2018-05-17 11:24:49 +02:00
Jaakko Laitinen f9466efcbb Close file on error 2018-05-15 11:50:16 +03:00
Reima Hyvönen 9fed29f950 optimation for inter_recon_bipred 2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen 5c585c4fbc Update help message
Updates the default option values to match the medium preset.
2018-04-03 10:40:37 +03:00
Arttu Ylä-Outinen 2b4e22111a Update presets
The new presets are slower but have better coding efficiency.
2018-04-03 10:37:30 +03:00
Arttu Ylä-Outinen 7185519a1b Update command line help
- Adds missing default values.
- Adds help for --crypto and --key.
- Adds help for --rd=3.
- Adds help for --sao options.
- Some changes to help wording.
2018-03-23 14:33:04 +02:00
Arttu Ylä-Outinen 3606860504 Add --no-cpuid option
Equivalent to --cpuid=0.
2018-03-23 12:32:27 +02:00
Arttu Ylä-Outinen fb462b25ef Fix transform skip for inter
The transform skip flag in cu_info_t was stored under the intra
substruct even though transform skip can be used for inter as well. This
caused bitstream errors. Fixed by moving the flag out of the substruct.
2018-03-20 11:01:33 +02:00
Arttu Ylä-Outinen b64e46707d Skip raster scan step in TZ search
Raster scan is very slow and the BD-rate improvement is marginal.
2018-03-01 14:04:03 +02:00
Arttu Ylä-Outinen 6877064230 Add zero neighborhood check to TZ search
Adds an additional grid search step that starts from the zero motion
vector after the normal grid search. The search range for this step is
half of the normal range.
2018-03-01 14:02:13 +02:00
Arttu Ylä-Outinen 74a413c46a Switch to star refinement in TZ search 2018-03-01 13:06:14 +02:00
Arttu Ylä-Outinen ebee428ee1 Add loop termination to TZ grid search
Terminates the grid search if no better motion vector was found in the
last three iterations.
2018-03-01 13:06:06 +02:00
Arttu Ylä-Outinen 4c175621dd Fix TZ grid search and star refinement
- Changes TZ grid search and star refinement to keep the origin constant
  instead of moving to the best position after each iteration.
- Changes star refinement to loop until there is no more improvement,
  instead of running the step only once.
2018-03-01 12:56:57 +02:00
Arttu Ylä-Outinen 9c2d0074a2 Add rounding of motion vectors in inter search
When the starting point for integer motion estimation was selected among
the merge candidates, the candidate motion vectors were always rounded
down. This commit changes the rounding so that they are rounded to the
nearest integer MV instead.
2018-03-01 09:39:21 +02:00
Ari Lemmetti 662430d441 Select CU type based on SSD, transform unit tree and mode cost of luma and chroma on --rd=2 2018-02-22 19:26:48 +02:00
Arttu Ylä-Outinen cb06cfeadb Drop temporary arrays in bipred search
Changes bipred search to use the original source and reconstruction
arrays directly instead of copying them.
2018-02-14 11:20:51 +02:00
Arttu Ylä-Outinen 0ea516ba30 Move bipred search to a separate function 2018-02-14 09:56:53 +02:00
Arttu Ylä-Outinen 6f506be12d Drop dynamic allocation from bipred search
Moves the temporary LCU struct used in bipred search from the heap to
the stack. The single malloc call was a huge bottleneck in bipred.
2018-02-14 09:55:02 +02:00
Arttu Ylä-Outinen 7155dd0db7 Add negative references to L1 list
Changes reference index list creation so that the negative references
are added to L1 in addition to L0 when biprediction is enabled and no
reordering of pictures is done. Biprediction can now be used with the
low-delay GOP structure.
2018-02-07 14:54:52 +02:00
Arttu Ylä-Outinen 4b24cd03a2 Update for crypto++ 6.0.0 compatibility
Changes the crypto module to use unsigned char instead of byte. The byte
typedef is no longer included in the global namespace in crypto++ 6.0.0.
See https://github.com/weidai11/cryptopp/issues/442.

Fixes #184.
2018-02-05 13:35:03 +02:00
Arttu Ylä-Outinen 8c53417006 Check zero coefficient cost for inter
Checks the cost of flushing all coefficients of an inter block to zero.
This is much faster than doing full RDOQ but can still reduce bitrate
significantly. Encoding speed is increased since fewer coefficient bits
have to be coded with CABAC.
2018-01-29 12:41:56 +02:00