Pauli Oikkonen
cb8209d1b3
Vectorize transform coefficient reordering loop
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
7cf4c7ae5f
Rename "reduce" functions to hsum
...
That's what the functions fundamendally do anyway
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
316cd8a846
Fix ALIGNED keyword and grow alignment to 64B
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
1befc69a4c
Implement sign bit hiding in AVX2
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
c5cd03497e
Require BMI and ABM instruction sets for AVX2 build
...
AVX2 support on a processor should always imply BMI and ABM support.
The lzcnt and tzcnt instructions have more suitable semantics in the
corner case that source word is 0, and allow us to even handle that
scenario without a branch. Apparently Visual Studio will already
include this support when building with AVX2 enabled, so only the
automake files need to be tweaked.
2018-12-03 15:36:32 +02:00
Reima Hyvönen
f8696b54a4
Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)
2018-11-20 17:09:19 +02:00
Marko Viitanen
a5a10a33c3
Enable --scaling-list parameter and add to the documentation
2018-11-19 10:47:30 +02:00
Reima Hyvönen
710ba288db
Chroma has some problems
2018-11-15 16:42:48 +02:00
Sami Ahovainio
8f98d4aac7
Added square search
2018-11-14 14:50:31 +02:00
Marko Viitanen
6871490dd5
Simplify get_mvd_coding_cost(), only include golomb coding
2018-11-14 14:33:31 +02:00
Ari Lemmetti
a832206bb6
Replace 32-bit incompatible instrinsics
2018-11-12 18:54:33 +02:00
Ari Lemmetti
5c774c4105
Rewrite most of FME and interpolation filters
...
Changes had to break a lot of stuff and were just squashed into this horrible code dump
2018-11-08 20:21:16 +02:00
Joose Sainio
1c8a1f24e2
Don't assume anything about bits spent
2018-11-07 16:03:38 +02:00
Joose Sainio
3471e2470d
Fix using uninitialized value for the first frame
2018-11-07 08:17:39 +02:00
Joose Sainio
d95ac11a3b
Fix rate_control for other LP-GOPS
2018-11-06 14:20:44 +02:00
Joose Sainio
67a6ba667e
Fix rate control for flat lp-gop
2018-11-06 09:38:17 +02:00
Reima Hyvönen
7406c33a42
Some more cleaning
2018-10-26 12:25:18 +03:00
Reima Hyvönen
4c71546b2e
Cleaned some coding
2018-10-26 12:19:44 +03:00
Reima Hyvönen
4fe3909e48
Switched luma to use 32bits size ints intstead of 16bit size
2018-10-24 18:24:46 +03:00
Eemeli Kallio
284e73839e
Calculating zero cost moved to its own function
2018-10-16 11:02:01 +03:00
Reima Hyvönen
381e786e10
Trying to find the bug in luma
2018-10-11 18:08:41 +03:00
Marko Viitanen
c589e5ed36
Fix closed-gop frame feed, the ordering was incorrect after the first GOP
2018-10-10 11:12:03 +03:00
Reima Hyvönen
2f5f81bac3
removed the non-optimated bipred function
2018-10-09 11:19:23 +03:00
Marko Viitanen
75dce4f3ce
Fix low-delay-gop usage with --no-open-gop
2018-10-04 15:16:02 +03:00
Marko Viitanen
de71b58f76
Change closed GOP structure to include an additional IDR between GOPs
2018-10-04 11:17:03 +03:00
Reima Hyvönen
212a8e68fa
Modified to avoid memory overflow, still some bug inside luma
2018-10-02 20:23:32 +03:00
Marko Viitanen
954f07e3d7
Add --(no-)open-gop option
2018-10-02 10:05:32 +03:00
Marko Viitanen
8bef85e056
Merge branch 'set-qp-in-cu'
2018-09-03 08:33:33 +03:00
Ari Lemmetti
2fdcc2b79d
Add option --set-qp-in-cu
2018-09-03 08:32:45 +03:00
Reima Hyvönen
896034b7cf
Some renamed functions back
2018-08-28 15:31:10 +03:00
Reima Hyvönen
e8b5e6db4c
Did some merging
2018-08-28 15:26:27 +03:00
Reima Hyvönen
7de5c74434
Updated bipred_recon to work faster
2018-08-28 15:12:31 +03:00
Reima Hyvönen
47b357cca2
Comment one test
2018-08-27 18:52:14 +03:00
Reima Hyvönen
2ca99a44e8
Updated shuffle operation to be in right order
2018-08-27 18:16:38 +03:00
Marko Viitanen
b85ae3688e
Signal QP in slice header if tiles and slices=tiles are enabled
...
Keeps the PPS constant for various purposes
2018-08-16 08:44:39 +03:00
Reima Hyvönen
508b218a12
some modifications made to prevent reading too much
2018-08-14 10:50:39 +03:00
Reima Hyvönen
1d935ee888
some useless stuff removed
2018-08-13 16:47:11 +03:00
Reima Hyvönen
ce3ac4c05e
some modifications to no_mov
2018-08-13 16:41:02 +03:00
Reima Hyvönen
15a613ae94
test if no_mov breaks testing
2018-08-13 16:02:56 +03:00
Reima Hyvönen
97a2049e58
removed pointer declaration out from switch
2018-08-10 16:42:26 +03:00
Reima Hyvönen
aa94bcedbc
Stream is now pointer
2018-08-10 16:38:49 +03:00
Reima Hyvönen
fa5b227ece
256 to 32 doesn't work, made them by hand
2018-08-10 16:01:20 +03:00
Reima Hyvönen
408dedbcc8
removed _mm256_extract_epi8 and replaced with _mm_stream
2018-08-10 15:53:26 +03:00
Reima Hyvönen
31c35091c6
_mm256_cvtsi256_si32 removed
2018-08-10 10:06:40 +03:00
Reima Hyvönen
99dc43074f
_mm256_cvtsi256_si32 breaks system, too much bits. back to extract
2018-08-10 09:59:33 +03:00
Reima Hyvönen
4f1f80b2cb
Transformed convert from 256 to cast 256 -> 128 and then convert from 128
2018-08-09 15:35:54 +03:00
Reima Hyvönen
4957555eb3
Removed leftover from 939
2018-08-09 15:25:03 +03:00
Reima Hyvönen
28b165c971
Clearified some sections, added _MM_SHUFFLE macro
2018-08-09 15:23:01 +03:00
Reima Hyvönen
dd04df8667
testing if error in both avx2 functions
2018-08-03 11:49:00 +03:00
Reima Hyvönen
ed50d71fde
Switched some variables to different location, altered inter_recon_bipred_avx2 function
2018-08-02 16:08:59 +03:00
Reima Hyvönen
f5739a0028
Renaming and removing useless prints
2018-08-02 14:47:17 +03:00
Reima Hyvönen
bc09f59bb6
Edited some definitions
2018-08-02 11:54:53 +03:00
Arttu Ylä-Outinen
83555c3d6d
Enable --fast-residual-cost with fastest presets
2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen
c438bb4a19
Add an option to skip CABAC for residual costs
...
Adds command line option --fast-residual-cost=<limit>. When QP is below
the limit, estimates the cost of coding the residual coefficients from
the sum of absolute coefficients. Skipping CABAC is not worth it with
high QPs because there are fewer coefficients so CABAC is not as slow.
2018-07-16 12:31:20 +03:00
Reima Hyvönen
a4bf77f208
Tested some extract functions
2018-07-12 09:29:32 +03:00
Reima Hyvönen
c05033a893
Even more useless vectors removed
2018-07-11 15:09:14 +03:00
Reima Hyvönen
884cb77238
Removed some not used vectors
2018-07-11 15:06:11 +03:00
Reima Hyvönen
792689a5ff
Removed for-loops, added extract instead
2018-07-11 14:56:41 +03:00
Reima Hyvönen
f9c7f6ee66
Added some break-operations for avx2 optimation
2018-07-11 14:15:38 +03:00
Reima Hyvönen
cc064da143
some more optimation for bipred
2018-07-11 11:27:54 +03:00
Reima Hyvönen
9a339eef89
Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD
...
# Conflicts:
# build/kvazaar_lib/kvazaar_lib.vcxproj
2018-07-10 16:21:04 +03:00
Reima Hyvönen
a22cf03ddb
Updated to have no movement function to avx2 strategies
2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen
b7474eb532
Fix SAO buffer sizes
...
Increases sizes of buffers used for SAO reconstruction to avoid stack
buffer overflow in AVX2 SAO reconstruction.
2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen
b37470e80f
Merge pull request #207 from jbeich/maltivec
...
Unbreak build on PowerPC if AltiVec isn't supported
2018-07-04 11:06:41 +03:00
Reima Hyvönen
ea83ae45f0
Toimiva ratkaisu
2018-07-03 11:18:51 +03:00
Jan Beich
4f4bea7496
Check -maltivec is supported before using
...
PowerPC target may lack or have non-standard FPU:
$ cc -dumpmachine
powerpcspe-undermydesk-freebsd
$ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c
src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist
2018-07-02 23:25:23 +00:00
Jan Beich
b892d820f8
Clean up macOS includes on powerpc* after 93e1c9f1c3
...
strategyselector.c:426:25: machine/cpu.h: No such file or directory
2018-07-02 21:52:45 +00:00
Reima Hyvönen
17babfffa4
25.6 working optimation, ~50% faster than original
2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen
2f995f4325
Merge pull request #205 from jbeich/powerpc
...
Unbreak build on non-Linux powerpc*
2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen
c1398ef818
Permit --period=1 with any GOP structure
...
All intra coding is a special case so it can be permitted even though
Kvazaar normally only supports intra periods that are divisible by the
GOP length.
2018-06-18 12:26:11 +03:00
Arttu Ylä-Outinen
abdebe0bf9
Fix --owf help message
...
The number of parallel frames is --owf plus one, not --owf minus one.
Fixes #204 .
2018-06-18 09:33:36 +03:00
Jan Beich
93e1c9f1c3
Add AltiVec detection for BSDs
...
strategyselector.c:377:26: linux/auxvec.h: No such file or directory
2018-06-17 15:38:24 +00:00
Miika Metsoila
98972d26c2
Document that the high tier requires level 4 or higher
2018-06-14 12:41:03 +03:00
Miika Metsoila
62b44efaa4
Write the encoding tier (main/high) into the bitstream
2018-06-14 12:41:03 +03:00
Arttu Ylä-Outinen
a343f6d587
Prepare for delta QPs at CU-level
...
- Replaces lcu_dqp_enabled with max_qp_delta_depth in encoder_control_t.
- Fixes set_cu_qps so that it can handle quantization groups of
arbitrary size.
- Fixes computation of QP predictors so that it works for quantization
groups of arbitrary size.
2018-06-13 15:36:19 +03:00
Arttu Ylä-Outinen
dc6b2024ea
Modify reference count asserts to fix data races
...
Changes asserts on the reference count of objects to assert the value
after KVZ_ATOMIC_INC instead of directly checking the value. Fixes some
data races detected by TSan.
2018-06-12 09:35:07 +03:00
Ari Lemmetti
4fb1c16c61
Add early termination for intra rdo when a zero coefficient block is found.
2018-06-08 21:03:07 +03:00
Ari Lemmetti
492529fb7a
Add the same comment to help message as well...
2018-05-30 14:13:15 +03:00
Ari Lemmetti
0d5972bf03
Add missing sort to intra transform split search so mode at 0 is the best
2018-05-21 13:10:38 +03:00
Sebastien Alaiwan
954bca7d6e
Fix memset parameter
2018-05-17 11:24:49 +02:00
Jaakko Laitinen
f9466efcbb
Close file on error
2018-05-15 11:50:16 +03:00
Reima Hyvönen
9fed29f950
optimation for inter_recon_bipred
2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen
5c585c4fbc
Update help message
...
Updates the default option values to match the medium preset.
2018-04-03 10:40:37 +03:00
Arttu Ylä-Outinen
2b4e22111a
Update presets
...
The new presets are slower but have better coding efficiency.
2018-04-03 10:37:30 +03:00
Arttu Ylä-Outinen
7185519a1b
Update command line help
...
- Adds missing default values.
- Adds help for --crypto and --key.
- Adds help for --rd=3.
- Adds help for --sao options.
- Some changes to help wording.
2018-03-23 14:33:04 +02:00
Arttu Ylä-Outinen
3606860504
Add --no-cpuid option
...
Equivalent to --cpuid=0.
2018-03-23 12:32:27 +02:00
Arttu Ylä-Outinen
fb462b25ef
Fix transform skip for inter
...
The transform skip flag in cu_info_t was stored under the intra
substruct even though transform skip can be used for inter as well. This
caused bitstream errors. Fixed by moving the flag out of the substruct.
2018-03-20 11:01:33 +02:00
Arttu Ylä-Outinen
b64e46707d
Skip raster scan step in TZ search
...
Raster scan is very slow and the BD-rate improvement is marginal.
2018-03-01 14:04:03 +02:00
Arttu Ylä-Outinen
6877064230
Add zero neighborhood check to TZ search
...
Adds an additional grid search step that starts from the zero motion
vector after the normal grid search. The search range for this step is
half of the normal range.
2018-03-01 14:02:13 +02:00
Arttu Ylä-Outinen
74a413c46a
Switch to star refinement in TZ search
2018-03-01 13:06:14 +02:00
Arttu Ylä-Outinen
ebee428ee1
Add loop termination to TZ grid search
...
Terminates the grid search if no better motion vector was found in the
last three iterations.
2018-03-01 13:06:06 +02:00
Arttu Ylä-Outinen
4c175621dd
Fix TZ grid search and star refinement
...
- Changes TZ grid search and star refinement to keep the origin constant
instead of moving to the best position after each iteration.
- Changes star refinement to loop until there is no more improvement,
instead of running the step only once.
2018-03-01 12:56:57 +02:00
Arttu Ylä-Outinen
9c2d0074a2
Add rounding of motion vectors in inter search
...
When the starting point for integer motion estimation was selected among
the merge candidates, the candidate motion vectors were always rounded
down. This commit changes the rounding so that they are rounded to the
nearest integer MV instead.
2018-03-01 09:39:21 +02:00
Ari Lemmetti
662430d441
Select CU type based on SSD, transform unit tree and mode cost of luma and chroma on --rd=2
2018-02-22 19:26:48 +02:00
Arttu Ylä-Outinen
cb06cfeadb
Drop temporary arrays in bipred search
...
Changes bipred search to use the original source and reconstruction
arrays directly instead of copying them.
2018-02-14 11:20:51 +02:00
Arttu Ylä-Outinen
0ea516ba30
Move bipred search to a separate function
2018-02-14 09:56:53 +02:00
Arttu Ylä-Outinen
6f506be12d
Drop dynamic allocation from bipred search
...
Moves the temporary LCU struct used in bipred search from the heap to
the stack. The single malloc call was a huge bottleneck in bipred.
2018-02-14 09:55:02 +02:00
Arttu Ylä-Outinen
7155dd0db7
Add negative references to L1 list
...
Changes reference index list creation so that the negative references
are added to L1 in addition to L0 when biprediction is enabled and no
reordering of pictures is done. Biprediction can now be used with the
low-delay GOP structure.
2018-02-07 14:54:52 +02:00
Arttu Ylä-Outinen
4b24cd03a2
Update for crypto++ 6.0.0 compatibility
...
Changes the crypto module to use unsigned char instead of byte. The byte
typedef is no longer included in the global namespace in crypto++ 6.0.0.
See https://github.com/weidai11/cryptopp/issues/442 .
Fixes #184 .
2018-02-05 13:35:03 +02:00
Arttu Ylä-Outinen
8c53417006
Check zero coefficient cost for inter
...
Checks the cost of flushing all coefficients of an inter block to zero.
This is much faster than doing full RDOQ but can still reduce bitrate
significantly. Encoding speed is increased since fewer coefficient bits
have to be coded with CABAC.
2018-01-29 12:41:56 +02:00