Arttu Makinen
f7fe8d9a27
Added more CC ALF functions.
...
Currently not working.
2020-12-30 16:21:59 +02:00
Arttu Makinen
9ed5169919
Finished functions get_blk_stats_cc_alf and calc_covariance_cc_alf for CC ALF.
2020-12-30 16:21:29 +02:00
Arttu Makinen
bf8bb62e50
Got rid of fair amount of global variables.
2020-12-30 16:21:28 +02:00
Arttu Makinen
7846796a4e
Removed #define FULL_FRAME.
2020-12-30 16:20:25 +02:00
Arttu Makinen
7bfb1ca6b4
Removal of useless comments.
2020-12-30 16:19:57 +02:00
Arttu Makinen
529bdb4dd2
Modify APS header writing.
2020-12-30 16:19:47 +02:00
Arttu Makinen
ee70bcfaec
Fixing warnings.
2020-12-30 16:19:07 +02:00
Arttu Makinen
d7eafc391f
Fixing uninitialized parameters.
2020-12-30 16:18:24 +02:00
Arttu Makinen
36ffdcaf3f
Disable output of debug stats.
2020-12-30 16:18:09 +02:00
Arttu Makinen
98768061db
Adding CC ALF.
2020-12-30 16:18:08 +02:00
Arttu Makinen
da04fffaec
Updated the creating of ALF parameters and init for them.
2020-12-30 16:17:54 +02:00
Arttu Makinen
bfa77e35c3
Fixed a bug where reconstruction for ALF was called multiple times for no reason.
...
Modified reconstruction of pixels after ALF search.
2020-12-30 16:17:43 +02:00
Arttu Makinen
bd292dab16
Fixed coding of headers for inter coding with ALF.
2020-12-30 16:15:12 +02:00
Arttu Makinen
26dc5b8c4e
Multiple APSs can now be signaled.
...
Can't test usage of multiple APSs properly because inter coding doesn't work.
2020-12-30 16:13:56 +02:00
Arttu Makinen
4ffb0b71a6
Chroma filtering works.
...
Also some code cleaning.
2020-12-30 16:13:25 +02:00
Arttu Makinen
a95fd73668
At least one APS can be signaled.
...
Problem with APS was in encoder_state-bitstream.c.
Cleaning of code.
2020-12-30 16:12:56 +02:00
Arttu Makinen
d7126520b2
Moving param_set_map from slices to cfg.
...
Bug fix in kvz_alf_encoder_ctb.
2020-12-30 16:12:38 +02:00
Arttu Makinen
c55a2a04e8
Bug fix in kvz_alf_encoder.
...
New bugs appeared with this fix.
2020-12-30 16:12:17 +02:00
Arttu Makinen
8aa91f320a
Bug fixes and cleaning.
2020-12-30 16:11:36 +02:00
Arttu Makinen
bfba8d43cb
Working on to get APS working for ALF.
2020-12-30 16:10:01 +02:00
Arttu Makinen
b3ecc755e2
ALF search is now executed for full frame. Works for only 1 frame.
...
Checksum matches.
APSs are not used currently.
#define FULL_FRAME in alf.h is set to 1 in order to use ALF for full frame.
#define FULL_FRAME 0 produces working bitstream but checksum doesn't match.
2020-12-30 16:08:46 +02:00
Arttu Makinen
94787acb73
Divided encoder_state_worker_encode_lcu -function in encoderstate.c into encoder_state_worker_encode_lcu_search and encoder_state_worker_encode_lcu_bitstream.
...
ALF off. No changes in bitstream.
2020-12-30 16:07:46 +02:00
Arttu Mäkinen
ec62ed89cb
LCUs now have mismatched only on boundaries.
...
Fixed a bug in alf.c line 5451.
Modifications to copying the boundary pixels of CTU.
2020-12-30 16:07:45 +02:00
Arttu Mäkinen
f202aa43fa
WIP Updating VTM8.2 to VTM10.0.
...
Small update to ALF cabac flags.
Minor variable definition updates.
2020-12-30 16:07:44 +02:00
Arttu Mäkinen
bc90b731a5
ALF updated to VTM8.2. Checksum doesn't match.
...
ALF uses currently only ready defined coefficients, not APSs.
Produces a valid bitstream, but checksum doesn't match.
CC ALF is disabled.
2020-12-30 16:06:59 +02:00
Arttu Mäkinen
2f80216514
Some cleaning and updating.
...
Set to use only existing filters rather than signal APS.
2020-12-30 16:02:01 +02:00
Arttu Mäkinen
a430d48669
ALF works now with VTM7.0 as in VTM6.1.
...
VTM properly decodes bitstream from kvazaar but the checksum doesn't match.
Couple hard coded values needed for this in function "kvz_encode_alf_bits".
2020-12-30 15:59:08 +02:00
Arttu Mäkinen
7250f4549b
Merge fixes.
2020-12-30 15:12:32 +02:00
Arttu Mäkinen
21a4751875
Works with VTM decoder with one frame with one hard coded value.
...
APS NAL unit type writing added.
Bug fixes.
WIP.
2020-12-30 15:11:17 +02:00
Arttu Mäkinen
9cad95c94c
Bug fixes.
...
WIP.
2020-12-30 15:09:13 +02:00
Arttu Mäkinen
09c68d9de6
Outputs valid frame with kvazaar. Still problems with cabac when decoding with VTM.
...
Decided to use buffers that were added in last commit.
Some small fixes and adjustments.
WIP.
2020-12-30 15:09:12 +02:00
Arttu Mäkinen
2cac901cca
Testing different kind of buffer for alf image fulldata.
...
WIP
2020-12-30 15:09:12 +02:00
Arttu Mäkinen
feb201986a
Changed to process one CTU at a time rather than all CTUs.
...
WIP
2020-12-30 15:09:11 +02:00
Arttu Mäkinen
b04bb66160
Adjustments and cleaning.
...
WIP
2020-12-30 15:09:10 +02:00
Arttu Mäkinen
c76c445142
Cabac/ctx calculation added.
...
Bug fixing and adjusting.
WIP
2020-12-30 14:32:01 +02:00
Arttu Makinen
ade4fc4061
Update of contexts of ALF.
...
WIP
2020-12-30 14:32:00 +02:00
Arttu Makinen
ebb99a7223
Changed 'width's to 'stride's, because added more pixels to 'fulldata'.
...
Also some small fixes and changes.
Checksum correct in luma.
WIP
2020-12-30 14:30:47 +02:00
Arttu Makinen
377aa989ab
Updated to VTM6.1.
...
Done according to all #ifs enabled
2020-12-30 14:27:15 +02:00
Arttu Makinen
0fbbf1a7e2
Small fixes/adjustments
2020-12-30 14:25:58 +02:00
Arttu Makinen
98a8e78e93
avx2/encode_coding_tree-avx2.c update, because it caused errors
2020-12-30 14:25:16 +02:00
Arttu Makinen
ed76650fa5
Updating to VTM6.0
2020-12-30 14:25:09 +02:00
Arttu Makinen
a24f49c286
Doesn't crash anymore during debug. Added new allocator for fulldata in kvz_picture.
2020-12-30 14:24:16 +02:00
Arttu Makinen
2b7a8af23a
Crashes now in kvz_image_free.
2020-12-30 14:22:38 +02:00
Arttu Makinen
05495bb555
Not working. All the functions done.
...
Heap corruption occur during debugging.
2020-12-30 14:22:30 +02:00
Arttu Mäkinen
236224dbb9
Broken version with header mismatch
2020-12-30 14:07:34 +02:00
Arttu Mäkinen
06233b5d3b
added alf parameter to cli
2020-12-30 14:02:58 +02:00
Jaakko Laitinen
71751c3770
Fix max filter size derivation
2020-12-29 17:57:35 +02:00
Jaakko Laitinen
6a8d73252a
Fix runtime errors
2020-12-28 16:41:00 +02:00
Jaakko Laitinen
85be89a85c
Fix compilation errors
2020-12-28 15:20:30 +02:00
Jaakko Laitinen
95ff22f0db
Finish max filter length fixes
2020-12-28 14:26:36 +02:00
Jaakko Laitinen
13e605153a
Fix bugs
2020-12-22 19:11:47 +02:00
Jaakko Laitinen
50e9acd3f4
Add max filter length derivation
2020-12-21 18:47:02 +02:00
Arttu Makinen
bc8507cc8d
MTS context.
2020-12-18 18:35:11 +02:00
Arttu Makinen
fd2f73b460
MTS headers and commands.
2020-12-18 17:40:47 +02:00
Jaakko Laitinen
7a71b700fb
Add chroma deblock filtering
2020-12-18 11:06:41 +02:00
Marko Viitanen
0c5e1db0fa
Fix wpp chroma bug
2020-12-15 22:59:22 +02:00
Marko Viitanen
071fe7fd51
Limit the top-right intra references when wpp is turned on
...
Chroma hash still fails.
2020-12-15 22:33:32 +02:00
Marko Viitanen
6146610ec8
Fix the wpp sync point to be the first LCU
2020-12-15 14:51:46 +02:00
Jaakko Laitinen
78be0ccd05
Fix chroma deblocking logic
2020-12-15 14:10:09 +02:00
Marko Viitanen
c07a56179f
Fix Hash SEI message for VTM11.0
2020-12-15 13:47:28 +02:00
Arttu Makinen
30c4065dc0
Headers for threading.
2020-12-15 13:04:39 +02:00
Jaakko Laitinen
6128db961a
Finish up large block filtering
2020-12-11 19:34:56 +02:00
Jaakko Laitinen
976d1c8812
Start implementing large block filtering
2020-12-10 18:03:18 +02:00
Jaakko Laitinen
33cea17484
Add logic for large block filtering
2020-12-09 19:10:38 +02:00
Jaakko Laitinen
d3d55933b2
Finish up strong filtering condition check
2020-12-08 18:38:05 +02:00
siivonek
e833354cdd
Merge branch 10-bit-assert-fix
2020-12-07 20:36:50 +02:00
Jaakko Laitinen
5a90deb678
Add initial max filter length and large block stuff
2020-12-07 18:54:43 +02:00
Jaakko Laitinen
03dade8246
Prepare for large blocks
2020-12-04 18:31:48 +02:00
Jaakko Laitinen
7b0b864947
Fix mvd thresholds and tc/beta index calculations
2020-12-04 15:54:40 +02:00
Jaakko Laitinen
8f3de705eb
Add todo list of things to check
2020-12-01 13:53:52 +02:00
Pauli Oikkonen
be19fd996b
Add default value for fast coeff table filename
...
..oops
2020-11-02 14:02:51 +02:00
Pauli Oikkonen
46301e9857
Document the --fast-coeff-table option
2020-10-29 15:23:26 +02:00
Pauli Oikkonen
816789c9f4
Allow fast coeff weights to be read from a file
2020-10-29 15:22:51 +02:00
Pauli Oikkonen
6799019db0
Move fast coeff table to transform.h
...
Guess this is a more logical place for it
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
4712ce5f59
Round the fast coeff result instead of flooring
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
0fb09c9920
New filtered coeff weight by QP values
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
9bf0cb27b1
Constrain fast cost estimation to QPs we have weights for
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
24d487f553
New weights for 12 <= QP <= 42
...
Trained using MSU ultrafast settings now
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
3e1c6d84b8
Fix issues in fast coeff estimation
...
Allow weight table to start from nonzero QP, and round weights to Q8.8
instead of flooring them
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
5f91bda762
Use newer data for fast coeff cost estimation
...
Same training dataset, but this time only buckets 0...3 were used to
approximate the function, no sign/cg width bucket.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
2abd733199
Use unsigned min() to correctly clip -32768
...
If a coeff happens to be -32768 (0x8000), its 16-bit abs() is also
0x8000. It should ultimately be clipped to 3, so interpret absolute
values as unsigned instead to make that happen.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
b93b90c0d7
Implement new fast coeff cost estimator in AVX2
2020-10-29 15:20:27 +02:00
Pauli Oikkonen
2f74a112b3
Try first lookup table based fast coeff estimation
2020-10-29 15:20:27 +02:00
Marko Viitanen
2db3a07b14
Prevent cu_sig_model_chroma array from being indexed over the limit
2020-10-13 14:14:57 +03:00
Marko Viitanen
f4948dda6f
Fix array size for bdpcm_mode[]
2020-10-13 12:51:20 +03:00
Marko Viitanen
9e3e8f51f6
Change kvz_g_tc_table_8x8 from uint8_t to uint16_t to fit all the values
2020-10-13 12:05:27 +03:00
Marko Viitanen
26f4f45c6d
Use correct pred_mode cabac models -> fixes inter cabac bits
2020-10-13 12:04:31 +03:00
Marko Viitanen
5a6806cbf7
[CI] Limit testing parameters to those that work
2020-10-09 09:37:15 +03:00
Marko Viitanen
3c7eb55292
Disable output of cabac debug when in "count only" mode
...
- Some code cleanup
2020-10-09 08:45:43 +03:00
Marko Viitanen
fa25621c77
Force certain intra modes off
2020-10-09 08:44:40 +03:00
Marko Viitanen
54b8fd054d
Fix Chroma QP scaling issue
2020-10-02 15:40:23 +03:00
Marko Viitanen
11229997b6
Fix NAL header layer_id
2020-10-01 11:10:40 +03:00
siivonek
bc1206a4d3
Define qp_delta_min & max in global.h instead of calculating them locally.
2020-09-29 13:46:27 +02:00
Marko Viitanen
ac2032eb65
Fixing P/B frame headers and debug output formatting
2020-09-28 14:58:07 +03:00
Marko Viitanen
bddfb47a55
Merge remote-tracking branch 'remotes/kvazaar_github/master'
2020-09-25 11:49:11 +03:00
Marko Viitanen
551a3991cf
Cleanup headers
2020-09-24 09:31:44 +03:00
siivonek
0f3ef786b9
Modify delta QP range assert so it will work with any valid bit depth. Modify VAQ code so it will clip the QP to a proper range which is dependent on bit depth
2020-09-22 20:15:23 +02:00
siivonek
fe6f93a951
Fix delta QP range check assert. Add separate asserts based on bit depth.
2020-09-22 20:15:22 +02:00
Marko Viitanen
449975b0fb
Fixed cubic filter usage in intra angular modes
2020-09-21 14:58:34 +03:00
Joose Sainio
8143ab971c
Merge branch 'stats-files'
...
# Conflicts:
# src/cfg.c
# src/cli.c
# src/kvazaar.h
2020-09-16 09:25:00 +03:00
Joose Sainio
1c06bd7f3d
Fix POC to be correct for all GOPs and Intra periods, fix issue with vaq
2020-09-14 14:25:48 +03:00
Sami Ahovainio
4d87fb2397
fixed potential out of bounds iteration
2020-09-10 12:59:39 +03:00
Sami Ahovainio
5d521a2444
Added option to force yuv as file format and made the options and file endings case insensitive
2020-09-09 16:05:59 +03:00
Joose Sainio
3fb8b7ebc6
Add --stats-file-prefix option
...
When the option is defined with an option four files prefixlambda.txt,
prefixqp.txt, prefixdist.txt, and prefixbits.txt that have the corresponding
data for each ctu. This is a debug feature.
2020-09-09 12:35:47 +03:00
Sami Ahovainio
84cabd9c20
Fixed sign match
2020-09-07 15:39:31 +03:00
Sami Ahovainio
d691849594
Added frame header reading for both read and seek functions
2020-09-07 15:31:08 +03:00
Sami Ahovainio
cbcee67821
y4m start header parsing ready
2020-09-07 15:31:07 +03:00
Joose Sainio
c10b841e7c
Merge remote-tracking branch 'remotes/origin/fix-sao-parameter' into master
2020-09-07 13:10:36 +03:00
Joose Sainio
da09d49890
Remove optionality from --sao
...
SAO parameter was optional which caused that if one wants to pass argument
one needs to use "=" which is confusing since this is not required for any
other parameter
2020-09-07 12:35:40 +03:00
Pauli Oikkonen
3f7f0d7ed7
Allow bit depth to be defined from the outside
...
For a 10-bit build, just use:
env CFLAGS="-DKVZ_BIT_DEPTH=10" ./configure && make clean && make
2020-09-02 17:55:22 +03:00
Pauli Oikkonen
780da4568a
Exclude 8-bit-only code from 10-bit builds and use uint8_t instead of kvz_pixel for code that assumes 8-bit pixels
2020-09-02 17:46:33 +03:00
Pauli Oikkonen
31ef4e4216
Fix ml functions to accept kvz_pixel*, not uint8_t*
2020-09-02 17:46:33 +03:00
Marko Viitanen
574c4d06ee
Fix use of log2_cg_size in coeff coding -> smaller blocks also decoded correctly
2020-08-27 18:26:16 +03:00
Marko Viitanen
b3f3a9eae6
Add two EOS NAL units at the end of each picture to make intra sequence work
2020-08-25 15:30:21 +03:00
Marko Viitanen
b7638172ca
Use continuous POC for all intra and add aud_irap_or_gdr_au_flag
2020-08-25 11:53:55 +03:00
Marko Viitanen
b53b53ed09
Fixed SAO headers, SAO produces valid output
2020-08-20 15:37:29 +03:00
Marko Viitanen
b4907e6337
Fix deblocking headers and some cleanup, deblocking does not produce valid output
2020-08-20 15:25:18 +03:00
Arttu Mäkinen
4da90b3722
Update of contexts.
2020-08-17 18:18:35 +03:00
Arttu Mäkinen
232332dc5f
Update of contexts.
2020-08-17 14:23:26 +03:00
Marko Viitanen
2fc8558926
Set correct profile, level and inter flags in IDR
2020-08-17 11:51:57 +03:00
Marko Viitanen
0f8ada02c4
Fix VPS writing
2020-08-17 11:26:09 +03:00
Arttu Mäkinen
da9f542209
WIP updating VTM8.2 to VTM10.0rc
2020-08-17 10:27:03 +03:00
Joose Sainio
faf5cc858d
Merge branch 'fix-lp-gop-rc'
2020-06-25 09:41:57 +03:00
Joose Sainio
138651ee85
Fix the bit and frame counts for calculating the gop allocation
...
Additionally dynamically adjust the smoothing window if there are rapid changes
2020-06-24 15:26:54 +03:00
Ari Lemmetti
f8ff6dd567
Merge pull request #262 from jbeich/truncate-freebsd
...
Unbreak build on FreeBSD
2020-06-22 18:08:01 +03:00
Ari Lemmetti
d1abf85229
Add MV constraint check to motion estimation start point
2020-06-01 23:51:38 +03:00
Marko Viitanen
20b66c9949
Sync to VTM 8.2 and add separate height to last_sig coding
2020-04-29 08:52:38 +03:00
Jan Beich
1fa69c705d
Rename truncate() from 30ce461d98
to avoid conflict with POSIX version
...
strategies/avx2/dct-avx2.c:55:23: error: static declaration of 'truncate' follows non-static declaration
static INLINE __m256i truncate(__m256i v, __m256i debias, int32_t shift)
^
/usr/include/stdio.h:448:6: note: previous declaration is here
int truncate(const char *, __off_t);
^
2020-04-22 16:09:42 +00:00
Ari Lemmetti
9753820b3a
Update version to 2.0.0
2020-04-22 01:03:36 +03:00
Ari Lemmetti
40e81f3243
Update preset tables. Update docs.
2020-04-22 01:03:21 +03:00
siivonek
54f438a75c
Update VAQ help text. Update docs. Change some lingering tabs to spaces.
2020-04-20 16:52:07 +02:00
Marko Viitanen
86d76b19a4
Fix intra neighboring block selection and clean some unused code
2020-04-16 14:12:40 +03:00
Marko Viitanen
27b4dd50f8
Fix picture header to code Inter frame
2020-04-14 08:24:11 +03:00
Ari Lemmetti
f31dddc019
Bypass inverse quantization and inverse transform when trying early skip
2020-04-10 16:02:09 +03:00
Pauli Oikkonen
fbdb1e2d15
Add correct path to sao_shared_generics.h in makefile
2020-04-08 19:27:12 +03:00
Pauli Oikkonen
8617530b13
Use _mm_store_epi64 instead of _mm_cvtsi128_si64
...
Fix 32-bit builds that tend to lack the cvt intrinsic. Hope it will be
optimized to a movq r64, xmm on modern platforms though
2020-04-07 23:51:54 +03:00
Pauli Oikkonen
a82966c0f5
Fix lacking _mm256_cvtss_f32 intrinsic on VS
...
Cast __m256 into __m128 first, the XMM variant of the intrinsic has been
around for a long enough time to be supported
2020-04-07 22:38:10 +03:00
Marko Viitanen
27ffba2c9c
Fix terminating bit condition at the end of the slice
2020-04-07 15:30:02 +03:00
Marko Viitanen
e737a878a6
Fix split flags and remove an extra terminating bit
2020-04-07 09:57:30 +03:00
Joose Sainio
c369ff8873
Fix a potential division by zero in a floating point operation
...
When C is calculated with K if the value of K is not clipped before in some
cases it is possible that K gets such a large negative value that bpp^K is
rounded to zero. In real-life cases this is extremely rare and clipping
beforhand has very little to no effect.
Also remove commented debug prints
2020-04-06 11:05:49 +03:00
Ari Lemmetti
901c25c0c8
Merge branch 'vaq'
2020-04-03 19:51:17 +03:00
Ari Lemmetti
51451be5ef
Handle cases where the number of pixels is not divisible by 32
2020-04-03 19:37:47 +03:00
siivonek
ee544304f1
Make function static to not mess up tests.
2020-04-03 15:22:34 +02:00
siivonek
e5267f7706
Fix define for use with Visual Studio.
2020-04-03 15:11:01 +02:00
siivonek
9e34369304
Merge branch 'vaq' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into vaq
2020-04-03 12:35:04 +02:00
siivonek
d025977949
Clamp edge lcu pixels if dimensions are not 64 divisible.
2020-04-03 12:33:14 +02:00
Pauli Oikkonen
addc1c3ede
Fix warning about potentially unused hsum_8x32b
...
There's a lot of alternative options available, such as making it
globally visible with a kvz_ prefix, force inlining it, or anything.
This could be good too, hope it won't be compiled at all to translation
units where it's not used.
2020-04-02 16:44:22 +03:00
siivonek
e3ba0bfb8c
Fix memory leak.
2020-04-02 14:15:36 +02:00
siivonek
566680af7b
Move function hsum to file where it is used to avoid errors.
2020-04-02 14:03:06 +02:00
siivonek
58be514e2a
Fix pipeline error.
2020-04-02 13:50:08 +02:00
siivonek
2aa0d97589
Add VAQ test in test_tools. Bump minor version number in configure.ac. Update help text for VAQ.
2020-04-01 18:16:39 +02:00
siivonek
c6e421019e
Merge vaq-simd
2020-03-31 21:40:29 +02:00
Jaakko Laitinen
8e4b738900
Fix error when first value in pu depth list is omitted
2020-03-31 16:57:12 +03:00
Jaakko Laitinen
54ef0bbfd2
Fix unintended functionality when giving multiple --pu-depth-intra/inter list parameters
2020-03-31 16:39:56 +03:00
Jaakko Laitinen
cb0c7b23b5
Merge branch 'intra_qp_offset_auto' into 'master'
...
Add auto option to intra-qp-offset
See merge request TIE/ultravideo/kvazaar!7
2020-03-31 16:17:36 +03:00
Pauli Oikkonen
99889dab15
Fix switch(bool) in picture-avx2.c
...
It passes on GCC but warns on Clang
2020-03-31 15:42:19 +03:00
Jaakko Laitinen
e0440c3de1
Update docs
2020-03-31 15:27:48 +03:00
Jaakko Laitinen
7760dcf441
Remove intra qp offset from preset parameters
2020-03-31 14:06:07 +03:00
Jaakko Laitinen
8bd1a2b667
Update help message
2020-03-31 13:19:05 +03:00
Jaakko Laitinen
b4f5486190
Set intra qp offset default to auto
2020-03-31 12:58:40 +03:00
Jaakko Laitinen
740688c67d
Add auto option to intra qp offset
2020-03-31 11:56:44 +03:00
Marko Viitanen
a0af87bdc0
Update contexts to match VTM 8.0
2020-03-30 14:34:50 +03:00
Marko Viitanen
d36ba85861
Fixed PPS and slice header to match VTM 8.0 (only for I-Frame!)
2020-03-30 12:55:12 +03:00
Marko Viitanen
64b9177cf0
Fix SPS to match VTM 8.0
2020-03-30 09:56:38 +03:00
Pauli Oikkonen
0c7bfa7dc9
Fix AVX2 on Clang
...
Besides just -mavx2, AVX2 support depends on a couple minor instruction
set extensions that should always exist on AVX2-capable hardware. Too
bad the different bit twiddling instructions are invoked slightly
differently between GCC and Clang, but now Clang seems to also produce
an AVX2-capable build.
2020-03-26 18:48:48 +02:00
siivonek
89d3e674ce
Comment out code which possible messes up OBA
2020-03-26 17:49:31 +02:00
siivonek
be7d9ddec5
Fix error in frame variance calculation. Chroma channels were not added to variance
2020-03-26 14:33:00 +02:00
Marko Viitanen
8908324df8
Fix PTL DPB HDR param headers to match VTM 8.0
2020-03-26 10:40:27 +02:00
Marko Viitanen
d622ebb1f4
Fix NAL types to match VTM 8.0
2020-03-26 10:39:35 +02:00
Jaakko Laitinen
45ca8f8113
Merge branch 'master' into 'extended_pu-depths'
2020-03-25 15:11:08 +02:00
siivonek
5986e71535
Fix mistake
2020-03-20 13:43:44 +02:00
Jaakko Laitinen
d6ffe9e495
Update docs
2020-03-20 13:27:07 +02:00
Jaakko Laitinen
621450cc1d
Update --help
2020-03-20 13:07:48 +02:00
Jaakko Laitinen
aaac3df69b
Add prefix to kvazaar.h define
2020-03-20 09:04:00 +02:00
siivonek
2a85be5752
Move qp_to_lambda so it is defined before use. Change some tabs to spaces
2020-03-19 22:13:53 +02:00
siivonek
0a4ce3c0aa
Add vaq to new rate control
2020-03-19 21:43:52 +02:00
siivonek
1bbc598d75
Merge branch 'master' into vaq
2020-03-19 20:19:43 +02:00
Joose Sainio
b53911d637
Merge branch 'rc-intra'
2020-03-19 13:34:15 +02:00
Joose Sainio
a304a8ea6e
Add weights for GOP 16 based on fitting a power curve to bits spent by HM
2020-03-19 11:13:43 +02:00
Joose Sainio
e823ac1dae
miscellaneous fixes
...
- bump library version
- add help desk for --clip-neighbour
- update the default values of --clip-neighbour and --intra-bits
- update tests to more sensible
2020-03-19 10:47:28 +02:00
Jaakko Laitinen
b2ddba38c2
Set correct size for pu-depth min/max data structure
2020-03-19 09:29:43 +02:00
Joose Sainio
2c345bc3cf
try to fix tsan issue
2020-03-18 14:58:54 +02:00
Jaakko Laitinen
fe428dcbe1
Fix no gop functionality
2020-03-18 11:03:33 +02:00
Jaakko Laitinen
af3d559d8d
Let pu-depth be defined per gop-layer
2020-03-17 17:57:18 +02:00
Ari Lemmetti
cbd77944d8
Costs in rough intra search may be negative. Get rid of UBSan error.
2020-03-16 22:13:14 +02:00
Ari Lemmetti
aa0ade3f65
Cast values to unsigned to make UBSan not trigger due to left-shifting negatives
2020-03-16 19:52:34 +02:00
RLamm
27fe716654
Fixed reference POC indexing
2020-03-11 15:33:37 +02:00
RLamm
bf24831780
Attempt to fix random crashes
2020-03-11 15:31:47 +02:00
RLamm
887659db1f
Attempted to scale the extra_mvs
2020-03-11 15:31:46 +02:00
siivonek
8d9719ff90
Merge branch 'master' into vaq
2020-03-05 14:17:01 +02:00
Joose Sainio
c9a8f2a596
Completely disable intra based model for frame 1
2020-03-04 12:52:13 +02:00
Joose Sainio
19c79c3e58
don't use the intra frame based estimation if the result is bad
2020-03-04 09:26:22 +02:00
Ari Lemmetti
7b7358c25a
Update presets veryslow and placebo a bit
...
Both use now --gop 16, --intra-qp-offset -3, --me tz, and --transform-skip
2020-03-03 20:41:01 +02:00
Pauli Oikkonen
60e7956dc5
Disable inaccurate integer variance calculation for now
2020-03-02 19:18:55 +02:00
Pauli Oikkonen
fc1b91335b
Implement variance calculation in integer math
...
Maybe this is a bit faster than FP, it's not accurate though
2020-03-02 18:17:18 +02:00
Pauli Oikkonen
35c825c75f
Move hsum_8x32b to avx2_common_functions
2020-02-27 17:52:17 +02:00
Pauli Oikkonen
b00ac7d1c4
AVX2 version of buffer variance calculation
2020-02-25 15:57:56 +02:00
siivonek
a380e43bda
Add chroma channels to variance calculation.
2020-02-24 19:54:34 +02:00
Pauli Oikkonen
1bd9c6dd93
Make a strategy out of pixel_var
2020-02-24 19:37:36 +02:00
Pauli Oikkonen
86ebf366e1
fix typo
2020-02-24 18:18:10 +02:00
Joose Sainio
f81de41775
Merge branch 'master' into rc-intra
2020-02-24 15:30:57 +02:00
siivonek
5688bcd646
Merge branch 'master' into vaq
2020-02-21 17:11:10 +02:00
siivonek
908ecb1767
Add rounding to aq offsets. Fix typo
2020-02-21 13:51:43 +02:00
Ari Lemmetti
1dfc69b42e
Consider merge index bits in merge analysis and early skip
2020-02-20 09:43:58 +02:00
Joose Sainio
7deb22c8e8
Merge branch 'master' into rc-intra
2020-02-19 15:01:04 +02:00
Kari Siivonen (TAU)
c972ca9067
Add assert to check if deltaQP out of bounds. Clip adaptive QP to [-13, 12].
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
f07990794f
Fix error in vaq pixel blit range calculation
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
57ed40c263
Fix application of aq offset
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
be2f420d61
Change: vaq requires parameter. Parameter defines vaq strength ex. 15 == 1.5
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
bf1b2c1e22
Add define for vaq strength parameter
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
150559a7e8
Fix bugs. Enable set_qp_in_cu when using vaq
2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)
c8c71274ee
Change tabs to spaces.
2020-02-18 13:20:26 +02:00
siivonek
888382953d
Implement calculation of vaq values. Values not used yet.
2020-02-18 13:20:25 +02:00
siivonek
ad40a88c09
Add no-vaq option to vaq
2020-02-18 13:20:25 +02:00
siivonek
09f0a1c52e
Fix typo in comment
2020-02-18 13:20:25 +02:00
siivonek
84fb3fd7d1
aq: Add --vaq commandline option
2020-02-18 13:20:25 +02:00
Joose Sainio
2a98f5db1e
fix intra-bits for lp-gop
2020-02-18 10:38:29 +02:00
Ari Lemmetti
71d9327f62
Further improve fast bipred
2020-02-17 20:32:52 +02:00
Ari Lemmetti
80c26870d5
Update docs
2020-02-15 23:29:18 +02:00
Ari Lemmetti
ebb183cc01
Add option to make intra QP offset configurable
2020-02-15 22:54:48 +02:00
Ari Lemmetti
be3e08d6db
Add gop.h to Makefile
2020-02-15 22:54:47 +02:00
Ari Lemmetti
1354acd358
Prevent negative values being written to SPS with --gop=0
2020-02-15 22:54:47 +02:00
Ari Lemmetti
fe4869916c
Disable GOP and intra qp offset for all-intra coding automatically
2020-02-15 22:54:46 +02:00
Ari Lemmetti
9849fb7c77
Enable experimental rate control for GOP 16
2020-02-15 22:54:46 +02:00
Ari Lemmetti
a0a22dec8a
Remove deprecated / unused lambda adjustments
2020-02-15 22:54:46 +02:00
Arttu Ylä-Outinen
829a70e6a7
Copy lowdelay GOP definition from HM
2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen
28f99c0b87
Change definition of 8-GOP to match HM
2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen
636fa8fbdd
Fix maximum decoded picture buffer size
2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen
ebd5156db5
Add definition for random access GOP of length 16
2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen
6653f06dd0
Only compute GOP layer weights when RC is enabled
2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen
c8fff1e0d6
Use a larger number of bits for POC lsb when needed
...
Changes the number of bits used for coding the least significant bits of
the POC based on the GOP size.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen
d757a832c2
Change GOP QP offset handling to match HM
...
Adds fields qp_model_scale and qp_model_offset to kvz_gop_config and
intra_qp_offset to kvz_config.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen
f37dcd5879
Move GOP definition to a separate file
...
Moves definition of the 8-GOP from cfg.c to gop.h.
2020-02-15 22:36:55 +02:00
Ari Lemmetti
6e1007a3e7
Get rid of LAMBA! (Commit #3000 )
2020-02-15 22:32:52 +02:00
Ari Lemmetti
0c02e71b43
Remove minor error from readme
2020-02-15 22:29:08 +02:00
Joose Sainio
e90d3141a2
Merge branch 'master' into rc-intra
2020-02-05 11:06:56 +02:00
Ari Lemmetti
9a0236bb4e
Add option 'zero-coeff-rdo'
2020-02-04 21:26:29 +02:00
Ari Lemmetti
886ff36d12
Initial implementation of fast bipred.
2020-02-04 15:46:23 +02:00
Ari Lemmetti
3c7dd0752f
Remove the broken "no mov" branch.
...
Causes hash mismatches for example in SlideShow sequence.
2020-02-03 15:26:31 +02:00
RLamm
bf8941ddb8
Added comment about partial-coding usage
2020-01-31 16:19:48 +02:00
RLamm
b8488ab48d
Changed "partial-coding" variables to uint32_t
2020-01-31 16:02:29 +02:00
RLamm
76e3249754
Changed parameter "slicer" to "partial-coding" to avoid confusion.
2020-01-31 14:22:32 +02:00
RLamm
30d5df40c5
Custom headers for the distributed coding
2020-01-29 15:54:49 +02:00
Joose Sainio
54571529a4
Fix accessing previous frame that didn't exist
2020-01-17 10:48:35 +02:00
Joose Sainio
5c671d20e1
Use the new clipping only in situations where it actually helps
2020-01-17 09:08:21 +02:00
Joose Sainio
3c34d7c863
Fix qp estimation and checking of previous frames that dont exist
2020-01-15 09:32:04 +02:00
Joose Sainio
1a35c22a52
Change clipping of lambda and qp for ctus on OBA rc
...
instead of clipping qp and lambda to the value of last value from the state
clip to previous frame with same layer and if such frame doesn't exist, clip
to previous frame
2020-01-14 14:46:05 +02:00
Pauli Oikkonen
c3d9e97e9f
Fix VS build
2019-12-12 18:34:55 +02:00
Pauli Oikkonen
7f238ca299
Remove debug print functions
...
Whoops
2019-12-12 18:19:31 +02:00
Pauli Oikkonen
eefb5e50b3
De-inline pred_filtered_dc functions, shouldn't make much difference though
2019-12-12 17:30:00 +02:00
Pauli Oikkonen
169314de4f
32x32 filtered DC prediction in AVX2
2019-12-11 18:17:06 +02:00
Pauli Oikkonen
fb2481b7e4
16x16 filtered DC implemented in AVX2
2019-12-10 15:54:50 +02:00
Joose Sainio
b78aa7b272
save c and k to frame
2019-12-06 10:52:54 +02:00
Joose Sainio
5b10e5fb7e
parameterize the clipping option
2019-12-06 09:51:04 +02:00
Pauli Oikkonen
da370ea36d
Implement AVX2 8x8 filtered DC algorithm
2019-11-28 14:10:10 +02:00
Pauli Oikkonen
5d9b7019ca
Implement a 4x4 filtered DC pred function
2019-11-26 17:05:54 +02:00
Joose Sainio
ca0060cbba
try the original clipping
2019-11-26 15:13:04 +02:00
Pauli Oikkonen
f1485ab087
Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?
2019-11-25 15:20:29 +02:00
Joose Sainio
ab2fded8af
Update threadwrapper to enable pthread_rwlock_t
2019-11-21 13:38:40 +02:00
Joose Sainio
eb78aead1f
Fix additional potential data races
2019-11-21 11:03:12 +02:00
Joose Sainio
35d7e0d88b
Fix data race
2019-11-21 10:25:04 +02:00
Marko Viitanen
94d89f03c7
Added cfg variable intra_smoothing_disabled and some cleanup
2019-11-20 08:38:33 +02:00
Marko Viitanen
eb2caf9118
Fix intra angle filter, changed from gauss filter table to run-time calculated 4-tap filter
2019-11-19 15:15:21 +02:00
Pauli Oikkonen
979d66031c
Create a strategy out of intra_pred_filtered_dc
2019-11-19 14:50:31 +02:00
Marko Viitanen
466d8772b0
Apply JVET_P0170_ZERO_POS_SIMPLIFICATION in coeff bypass coding
2019-11-19 14:32:38 +02:00
Joose Sainio
0e8815a3d8
test clipping qp to previous frame instead of previous ctus
2019-11-19 14:32:31 +02:00
Joose Sainio
ddb4e5a131
move the intra bit calculation so that it is used also with lambda rc
2019-11-19 14:16:48 +02:00
Joose Sainio
a07833f3e6
check that mallocs in rc initialization were successful
...
only call kvz_update_after_picture when using the OBA rc
2019-11-19 13:59:44 +02:00
Joose Sainio
50d410a316
re-enable static qp encoding and lambda rc
2019-11-19 13:45:58 +02:00
Pauli Oikkonen
fa4bb86406
Optimize intra_pred_planar_avx2 for 4x4 blocks
2019-11-19 13:39:02 +02:00
Marko Viitanen
3df2642b03
Fix qt cbf context init value
2019-11-19 13:27:36 +02:00
Joose Sainio
57e5615ece
Fix incorrect intra rc calculation skipping
2019-11-19 13:25:31 +02:00
Joose Sainio
6cc3bcd87e
Command line parameters for oba rc and implementation of the usage of the intra parameter
2019-11-19 09:29:06 +02:00
Joose Sainio
eb73548af5
Encode first frame completely before starting others to enable owf
2019-11-18 09:51:37 +02:00
Marko Viitanen
17a53230fd
Code cleanup, remove unused arrays and remove tabs
2019-11-18 09:01:23 +02:00
Pauli Oikkonen
4761d228f9
Start to vectorize the 4x4 loop
2019-11-15 17:32:40 +02:00
Pauli Oikkonen
8d45ab4951
Stupidify the 4x4 planar loop for vectorization
2019-11-14 17:14:04 +02:00
Marko Viitanen
91528f3292
Update contexts
2019-11-14 13:46:51 +02:00
Marko Viitanen
b309ed90be
Fix NAL packet and missing fields in SPS
2019-11-14 09:21:11 +02:00
Marko Viitanen
74514981a9
Fixed PPS, SPS and slice headers and NAL unit types
2019-11-13 15:59:36 +02:00
Joose Sainio
c759c138ed
Prepare the rc data structure to be shared among all frame encoders
2019-11-13 11:56:25 +02:00
Joose Sainio
cdb7c851a4
Fix weight calculation
2019-11-13 08:55:31 +02:00
Joose Sainio
b9b01f8036
WPP with threading
2019-11-12 12:12:57 +02:00
Joose Sainio
615973adca
should enable threading with wpp when owf is not used
2019-11-12 09:03:00 +02:00
Pauli Oikkonen
6f13f6525c
Merge branch 'new_prints'
2019-11-07 17:04:21 +02:00
Joose Sainio
d353f7dd1a
Disable debug prints, fix multiple bugs in the calculation
2019-11-07 15:08:57 +02:00
mercat
57e8c3ebc2
Merge branch 'ML-cplx_red_ICIP'
2019-11-07 13:25:47 +02:00
Pauli Oikkonen
558f0ec401
Mbps, not mbps
2019-11-05 18:06:00 +02:00
Pauli Oikkonen
2edf533925
Tidy the end report printing
...
Also fix a bug with non-integer target FPS
2019-11-05 17:20:00 +02:00
Joose Sainio
408fd4ccb6
Fix lambda and qp calcualtion for intra frames
...
also fixes a bug with selecting the clip neighbor lambda and clip neighbor qp
selection for inter frames
2019-11-05 10:51:39 +02:00
Pauli Oikkonen
c7313ce567
Store AVG QP information in encmain
2019-11-04 17:08:07 +02:00
Reima Hyvönen
80575c59bf
Some updates done to get right bitrate and avg QP
2019-10-31 15:56:24 +02:00
Reima Hyvönen
252bab8820
Added prints to bitrate and AVG QP
2019-10-31 15:56:24 +02:00
Pauli Oikkonen
6d7a4f555c
Also remove 16x16 (A * B^T)^T matrix multiply
...
Can be done using (B * A^T) instead, it's the exact same
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
2c2deb2366
Tidy AVX2 32x32 matrix multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
98ad78b333
Tidy the old AVX2 32x32 matrix multiply
...
It was actually a very good algorithm, just looked messy!
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
4a921cbdb5
Retain data as much in YMM registers as possible
...
This seems to make it a whole lot quicker
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ac4d710e23
Unroll 32x32 matrix multiply, use all regs
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
a58608d0b8
Remove totally unnecessary (A * B^T)^T 32x32 multiply
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
043f53539f
Implement a streamlined matrix-multiply 32x32 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e9da2d851b
Tidy 32x32 fast DCT's helper functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e382339182
Implement fast (butterfly) 32x32 DCT in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
b5962dadac
Tidy indentation in AVX2 16x16 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
36a8f89025
Fine-tune 16x16 AVX2 iDCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
ca9409de2b
Implement 16x16 DCT as butterfly algorithm in AVX2
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7c69a26717
Use aligned loads and stores for AVX2 DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e9c65dca6
Align DCT matrices and temp transform buffers
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
148a150522
Align DCT source and dest blocks to cache line
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
8e60bbf6a6
Slightly tune 16x16 forward DCT
...
Use an array of __m256i's to store temporary value, essentially letting
the compiler enforce alignment and use aligned loads and stores.
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
c0cc0e8a75
Optimize 16x16 multiply by only slicing right mat once
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
e463d27f22
Implement streamlined generic 16x16 matrix multiply
...
It can't be this fast for real, can it?
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
beb85ce9d6
Reorder parameters for 8x8 matrix multiplies
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
292af62256
Implement tailored 16x16 forward DCT
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
30ce461d98
Redo 4x4 matrix multiplication
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
07970ea82f
Streamline by-the-book 8x8 matrix multiplication
...
Also chop up the forward transform into two tailored multiply functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen
7ec7ab3361
Implement a tailored AVX2 8x8 DCT
2019-10-28 16:19:42 +02:00
Joose Sainio
372934c7db
Fix division by zero
2019-10-10 16:35:56 +03:00
Joose Sainio
9bdfdeaf5c
Rest of the owl
2019-10-09 15:48:58 +03:00
Joose Sainio
1ba8525faf
WIP
2019-10-09 10:35:07 +03:00
Joose Sainio
19496d2692
?
2019-10-03 14:50:11 +03:00
Joose Sainio
4b111e339e
fix couple of bugs in the implementation, bit calculation seems still bit off
2019-10-01 15:08:39 +03:00
Joose Sainio
84615e406a
fix compiler warnings
2019-09-27 14:20:08 +03:00
Joose Sainio
14b7a75713
Call the new functions and fix bugs
2019-09-27 14:14:24 +03:00
Joose Sainio
ef74bfb182
unify naming
2019-09-27 10:16:21 +03:00
Joose Sainio
e36f481bda
qp calculation for frame
2019-09-27 09:05:40 +03:00
Joose Sainio
47019ca1cd
intra ck update
2019-09-26 16:04:53 +03:00
Joose Sainio
7c8f4da7cb
Update c and k except after first intra
2019-09-26 13:09:28 +03:00
Joose Sainio
0577d481c1
CTU level code
2019-09-25 12:12:21 +03:00
pkubaj
1d7fcf4227
Fix build on powerpc64 with LLVM
2019-09-12 15:05:00 +02:00
mercat
0de567bfa4
Fixe memory leak
2019-09-12 09:45:32 +03:00
mercat
fa116de619
Add static
2019-09-11 16:18:12 +03:00
mercat
b8753a9293
Fucking INLINE fixed
2019-09-11 16:12:07 +03:00
mercat
b855144e68
INLINE fixe
2019-09-11 16:12:07 +03:00
mercat
694337b803
Add const and more const
2019-09-11 16:12:07 +03:00
mercat
21c07638ed
Remove const into kvz_init_constraint.
2019-09-11 16:12:06 +03:00
mercat
2bca507abe
Clean version of machine learning constraint code. (ICIP paper)
2019-09-11 16:12:06 +03:00
Alexandre Mercat
0f4b7be6ee
First version of ML ICIP code for master
2019-09-11 16:12:06 +03:00
Pauli Oikkonen
99597b828a
Work around the ancient Win32 calling convention hassle
...
See if this'll work now
2019-09-06 13:14:42 +03:00
Pauli Oikkonen
c5ca18950c
Revert "Revert to 6924d90052
due to broken visual studio build"
...
This reverts commit 1dd0619bd7
.
2019-09-05 18:21:55 +03:00
Pauli Oikkonen
55529decd5
Implement _mm256_insert_epi32 and extract pseudo-ops
...
Visual Studio headers apparently lack these guys
2019-09-05 18:20:52 +03:00
Marko Viitanen
28dc4fa2ed
Fix intra MPM selection
2019-09-05 09:39:13 +03:00
Ari Lemmetti
147378e1f9
Prevent 8x4 and 4x8 bipred in merge analysis
2019-09-03 16:32:50 +03:00
Ari Lemmetti
ef1fdbf259
Separate prediction of single PU/PB from CU/CB
2019-09-03 16:32:50 +03:00
Joose Sainio
7d2737bdf6
WIP picture lambda calculation
2019-09-03 11:03:35 +03:00
Ari Lemmetti
3bc510712f
Enable merge analysis for smp and amp
2019-09-02 17:31:51 +03:00
Ari Lemmetti
557bcbc6aa
Make luma or chroma only inter "recon" or predict possible
2019-09-02 17:15:28 +03:00
Marko Viitanen
6d5e20ca13
Header changes to match VTM 6.1
2019-09-02 09:42:35 +03:00
RLamm
60be6d411c
Intra filtering fixed at least for luma. All intra modes output valid luma (hashes match), but chroma is still broken.
2019-08-30 16:14:00 +03:00
RLamm
83ac39094a
Use new PDPC filtering for planar and DC modes
2019-08-29 12:51:34 +03:00
Joose Sainio
131c04f65c
Fix incorrect weight for intra frame
2019-08-29 12:01:13 +03:00
Joose Sainio
8f96678d13
Fix issue with intra frames being part of gop when they shouldn't
2019-08-29 09:28:10 +03:00
Ari Lemmetti
aa8ab195d1
Compare rough cost of the best merge mode against AMVP to make mode decision
2019-08-26 22:49:09 +03:00
Ari Lemmetti
8f866ff83a
Use correct index
2019-08-26 20:10:10 +03:00
Ari Lemmetti
2343958a14
Fix transform split for small luma blocks
2019-08-24 21:50:17 +03:00
Ari Lemmetti
800fc8644d
Reset CBFs because CBFs might have been set earlier for depth earlier.
2019-08-24 21:49:33 +03:00
Ari Lemmetti
a80de22bc7
Add only different candidates to the list
2019-08-24 21:49:33 +03:00
Ari Lemmetti
45c7961412
Remove tr depth fill. It should not be needed.
2019-08-24 21:49:32 +03:00
Ari Lemmetti
ff8711aaab
Add missing logic to add valid indices to list
2019-08-24 21:49:29 +03:00
Marko Viitanen
cb0d7c340a
Use the new PDPC filtering in angular intra
2019-08-23 14:44:41 +03:00
Marko Viitanen
5bebb18943
Change intra filtering according to VTM6
2019-08-23 08:56:35 +03:00
Marko Viitanen
a16efe6b52
Merge remote-tracking branch 'remotes/github_kvazaar/master'
...
# Conflicts:
# build/kvazaar_VS2013.sln
# build/kvazaar_VS2015.sln
# build/kvazaar_VS2017.sln
# build/kvazaar_cli/kvazaar_cli.vcxproj
# build/kvazaar_lib/kvazaar_lib.vcxproj
# build/kvazaar_tests/kvazaar_tests.vcxproj
# src/encode_coding_tree.c
# src/encode_coding_tree.h
# src/encoder_state-bitstream.c
# src/inter.c
# src/strategies/avx2/quant-avx2.c
2019-08-22 15:12:01 +03:00
Marko Viitanen
01ea762c1f
Fix coeff coding ad remove bdpcm flag -> CABAC bits match with VTM 6.0
2019-08-22 14:33:42 +03:00
Marko Viitanen
210af8adbe
Remove joint_cb_cr flag and fix split_flag context selection
2019-08-22 11:23:24 +03:00
Marko Viitanen
c713d31c93
Fix sig_coeff context selection
2019-08-22 10:57:50 +03:00
Marko Viitanen
48b8898e53
Fix CBF context init and use
2019-08-22 10:44:47 +03:00
Marko Viitanen
db94ec1a84
Rename intra_mode_model -> intra_luma_mpm_flag_model and update the contexts
2019-08-19 15:17:25 +03:00
Marko Viitanen
1c6ffc0a7e
Fix wrong variable types in context init
2019-08-19 14:33:55 +03:00
Marko Viitanen
cd6be15e10
Fix context init to match VTM6.0
2019-08-19 13:57:31 +03:00
Marko Viitanen
3de198d2db
Sync contexts with VTM6.0
2019-08-19 09:39:59 +03:00
Marko Viitanen
e644b03615
Fix headers to match VTM6.0rc1
2019-08-16 15:33:20 +03:00
Ari Lemmetti
1dd0619bd7
Revert to 6924d90052
due to broken visual studio build
2019-08-08 15:15:34 +03:00
Pauli Oikkonen
2852baa673
Separate sign3_diff_epu8 from calc_eo_cat
...
Just to keep things simple, clear and obvious
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
17947b79ee
Add sao_shared_generics.h in Makefile.am
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a8dd6ce351
Add a note about having implemented a separate AVX2 version of SAO offset array calculation
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
a858e7dd4b
Combine duplicate code into inline functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
de0e97f711
Take 8/16/24b loads and stores into separate functions
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
10979f58fe
Tidy up code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
9cc11976c0
Combine the delta accumulation from edge and band ddistortion into shared func
...
This won't reduce object size, but there'll be less duplicate code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
55d877bd66
Vectorize sao_edge_ddistortion
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
aef0f301d3
Fix function signatures
...
Mark anything intended as read-only to be const, and fix alignment
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
997fd369b3
Redo calc_sao_edge_dir_avx2
...
Do it wider, 32 pixels at once!
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
db1e475e02
Use i32 instead of i8 for x/y offsets
...
Doesn't matter too much, because this number isn't used in SIMD
computation, only as a memory reference offset.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
12de466ef5
Reimplement non-band SAO color reconstruction in AVX2
...
Streamline things to work on 32 pixels at once instead of 8
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
e8bff99329
Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction
...
Vectorize it all, hope this helps with perf
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
7b5dffa855
Implement calc_sao_offset_array in AVX2
...
To be efficient, the AVX2 color reconstruction algorithm will need
offsets in byte, not dword, arrays. This is completely specific to 8-bit
pixels and the function signature is fundamentally distinct from the
generic algorithm, so it's better to not strategize SAO offset array
calculation.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
29563b7039
Make kvz_calc_sao_offset_array more obvious
...
Name temporary values from array lookups etc that are referred multiple
times to, to make the behavior of the mechanism more transparent. Define
all the constant values at the beginning of the function and declare as
const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
08881f5e9b
(TEMP) (TODO) (whatever) Avoid compiler warnings
...
I want the CI to not crash on its -Wall -Werror, but instead to actually
build the thing and report me about actual memory errors etc
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
c18adc5ee0
Redo sao_band_ddistortion_avx2
...
Avoid branching and do the entire thing on 32 pixels at once in YMMs.
Also make the sao_bands function parameter const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
2827c3e3ab
Make calc_sao_bands less opaque
2019-08-07 16:35:24 +03:00
Pauli Oikkonen
1bb9a079a8
Fix indentation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7bc959c7c5
3 sao functions are now working
2019-08-07 16:35:24 +03:00
Reima Hyvönen
0e0f2d3490
made to clear sum vector after it has been set to memory
2019-08-07 16:35:24 +03:00
Reima Hyvönen
f146de7acb
removed some variables to prevent memory losses
2019-08-07 16:35:24 +03:00
Reima Hyvönen
247c3a7a71
conversed gined to unsigned int
2019-08-07 16:35:24 +03:00
Reima Hyvönen
ac5c216974
Some more memory error preventing to sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3fb1cbca35
more editing sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
afbb6fb960
some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures
2019-08-07 16:35:24 +03:00
Reima Hyvönen
3496a57f7a
Edited sao_edge_ddistortion_avx2 to avoid memory overflow
2019-08-07 16:35:24 +03:00
Reima Hyvönen
267ba1d6ce
Modified sao_band_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
e70663b245
added some sub commands to avoid memory read errors
2019-08-07 16:35:24 +03:00
Reima Hyvönen
59dfb4570c
Converted some loads to load int8_t instead ints
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8b253209a8
Found false address load from calc_sao_edge_dir. Should now work like generic
2019-08-07 16:35:24 +03:00
Reima Hyvönen
50e0a47b7a
Took away __restrict
2019-08-07 16:35:24 +03:00
Reima Hyvönen
8a39eb674e
Removed c-variable from calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bc0a36830d
Clerified some 6 pixel loads
2019-08-07 16:35:24 +03:00
Reima Hyvönen
1a8b211e05
Added break to line 170
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d05e750ebe
Added some switches to prevent segmentation fault from reading
2019-08-07 16:35:24 +03:00
Reima Hyvönen
203580047d
Defined some AVX functions
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c884c738b1
Updated some commands to match the standard
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b412ed2f59
Removed some setr and used loads calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c6cc063534
converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract
2019-08-07 16:35:24 +03:00
Reima Hyvönen
47ac109b10
optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND
2019-08-07 16:35:24 +03:00
Reima Hyvönen
96dc60a1ed
first working optimation
2019-08-07 16:35:24 +03:00
Reima Hyvönen
c148aff9fb
Some optimation done to function sao_reconstruct_color_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
bf16ba6cc4
Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
79dc39a676
Some editing for sao_edge_ddistortion_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
06ee52924e
some reconst done to calc_sao_edge_dir_avx2
2019-08-07 16:35:24 +03:00
Reima Hyvönen
5fbc65d823
reconst optimation doesn't work yet
2019-08-07 16:35:24 +03:00
Reima Hyvönen
d29f834a69
Remove useless function
2019-08-07 16:35:24 +03:00
Reima Hyvönen
a232a12160
calc_sao_edge_dir_avx2 updated
2019-08-07 16:35:24 +03:00
Reima Hyvönen
b1febc02a5
sao_edge_ddistortion_avx2 now working proberly
2019-08-07 16:35:24 +03:00
Reima Hyvönen
cd6092a1ec
Still too much bits, looking for where they appear
2019-08-07 16:35:24 +03:00
Reima Hyvönen
7853be8eeb
Incomple optimation
2019-08-07 16:35:24 +03:00
Marko Viitanen
dfa5621024
Intrapred cleanup
2019-07-16 14:23:10 +03:00
Ari Lemmetti
40609aa865
Add missing headers to Makefile.am
2019-07-12 19:15:51 +03:00
Ari Lemmetti
5db3a78499
Bump versions for release 1.3
2019-07-09 22:09:32 +03:00
Ari Lemmetti
d513ab1999
Add missing newline
2019-07-09 21:06:05 +03:00
Ari Lemmetti
4967072625
Do not bypass search on skip cu if early_skip is not enabled
2019-07-09 20:20:12 +03:00
Ari Lemmetti
b20992a9f3
Rename functions more descriptive
2019-07-09 20:20:11 +03:00
Ari Lemmetti
a348a0ec23
Fix transform depth in early skip
2019-07-09 20:05:48 +03:00
Pauli Oikkonen
8d48bee180
Tidy fast coeff cost code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
201a43b08e
Clean up the RD-estimation code
2019-07-09 18:01:54 +03:00
Pauli Oikkonen
b111df5073
Create preliminary version of improved cost estimator
2019-07-09 18:01:54 +03:00
Ari Lemmetti
be08a87d94
Add missing parameter max-merge to the help message
2019-07-09 16:28:46 +03:00
Ari Lemmetti
d0bb9b4a6d
Add parameter max-merge to presets
2019-07-09 16:26:03 +03:00
Ari Lemmetti
4097331fd6
Early skip
2019-07-09 15:59:31 +03:00
Marko Viitanen
10d850e98a
Use index_offset in intra angular and change the offset to width+1
2019-07-08 14:23:19 +03:00
Marko Viitanen
3d1fa2a9cf
Fixing angular intra prediction reference pixels
2019-07-08 14:00:02 +03:00
Marko Viitanen
0656c54cab
Fix some problems with reference pixels in angular intra prediction kvz_angular_pred_generic()
2019-07-05 15:54:51 +03:00
Marko Viitanen
89ca2d4ba1
Use correct type for modedisp2sampledisp array
2019-07-05 14:12:10 +03:00
Marko Viitanen
2e8a0d08f9
Fix mvp_idx_model initialization and use
2019-07-05 14:11:29 +03:00
Joose Sainio
977e885ea2
Fix issue with gop=0 introduced in 1c36f68d0c
2019-07-05 12:57:27 +03:00
Marko Viitanen
c6217e236f
Enable 4-tap filtering for the intra angular
2019-07-04 16:26:10 +03:00
Marko Viitanen
cda6d951c0
Change DCT arrays back to 8-bit -> some frames are now correct
2019-07-04 15:59:10 +03:00
Marko Viitanen
8280bd3217
Add channel info to angular_pred and fix the displacement tables.
...
Also includes 4-tap intra filtering code commented out
2019-07-04 09:35:47 +03:00
Marko Viitanen
5e4369d6b0
Fix the kvz_cabac_encode_aligned_bins_ep function -> cabac coding now correct
2019-07-03 15:55:52 +03:00
Marko Viitanen
3fad4b0a98
Disable kvz_cabac_encode_aligned_bins_ep for now and add a ToDo message
2019-07-03 15:44:35 +03:00
Sami Ahovainio
ce1e67cc3a
Modified header flags to match VTM commit b9080ff45bec368c44f0c43a32dcd6804ef9f5d6
2019-07-01 13:58:15 +03:00
Sami Ahovainio
3863064d90
Fixed bugs in split decision and coefficient coding.
2019-07-01 13:00:43 +03:00
Mikko Pitkänen
a7f09c8114
Merge branch 'threadwrapper'
2019-06-24 16:54:59 +03:00
Sami Ahovainio
db5c0230e5
Fixed coefficient sign hiding
2019-06-20 16:26:01 +03:00
Sami Ahovainio
b51254cafd
Fixed significant coefficient group context calculation
2019-06-20 15:47:13 +03:00
Sami Ahovainio
5e0bea962c
Fixed split context decision
2019-06-20 15:30:49 +03:00
Sami Ahovainio
12322144f0
Removed debug print from context.c
2019-06-20 15:18:22 +03:00
Sami Ahovainio
3a9800d07d
Fixed coefficient coding. Fixed headers to match VTM commit e65075531471a68632bc9252d607655a0feeabc6
2019-06-20 14:43:03 +03:00
Mikko Pitkänen
3dd606ce2e
Add new threadwrapper
2019-06-18 18:45:45 +03:00
Sami Ahovainio
2c78aa0642
Fixes to coeff coding.
2019-06-13 12:01:29 +03:00
Joose Sainio
c94077d15e
remove hardcoded value
2019-06-12 14:37:41 +03:00
Joose Sainio
ac68c8444d
remove negation that wasn't supposed to be there
2019-06-12 14:35:24 +03:00
Joose Sainio
5851dcc3be
missing negation
2019-06-12 14:08:18 +03:00
Joose Sainio
1c36f68d0c
Fix owf>=9 gop=8 and add test to catch such problem in future
2019-06-12 14:04:41 +03:00
Sami Ahovainio
3564b4829e
Fixed split context decision. Modified intra mode initialization to match VTM version aa76fc5c04cf43390f43d63f9977bea8ee31997a.
2019-06-12 12:59:16 +03:00
Sami Ahovainio
a8a53e15b5
Fixed headers to match VTM commit aa76fc5c04cf43390f43d63f9977bea8ee31997a. Added multi_ref_line flag coding.
2019-06-07 13:37:45 +03:00
Ari Lemmetti
933ff6ed55
Merge branch 'set-qp-in-cu-fix'
2019-06-07 09:01:03 +03:00
Sami Ahovainio
8d2581e58c
Fixed issue with kvz_go_rice_par_abs where passing a unsigned argument caused MIN function to return wrong value. Modified coefficient coding to match VTM 5.0. Some issues still remain.
2019-06-05 15:57:18 +03:00
Sami Ahovainio
367f1b2129
Fixed splitting bug caused by wrong values in the headers. Fixed header flags to match VTM commit 5703e81b2de677d976ec15423f5768b17619ba6a
2019-06-05 11:21:02 +03:00
Sami Ahovainio
76d56290ed
Fixed VUI header writing. Fixed debug prints of NAL headers and rbsp_stop_one_bit.
2019-05-31 11:13:11 +03:00
Ari Lemmetti
c6da839002
Set lcu sqrt lambda according to lcu lambda instead of frame lambda when ROI is used
2019-05-29 18:32:10 +03:00
Marko Viitanen
8282a18c36
Fixed headers and NAL writing to match the latest VTM master 988c22cbb9c58584cac3ef0ec7794cafbea6dfd6
2019-05-29 16:18:35 +03:00
Sami Ahovainio
4768ba0628
Minor fixes to header writing. Added contexts for multi_ref_line and BDPCM. Functions added for writing both in bitstream, but they are both disabled for now.
2019-05-29 13:00:19 +03:00
Sami Ahovainio
3339e12169
Fixed some header flags
2019-05-27 09:56:56 +03:00
Ari Lemmetti
9339845e8b
Set QP completely at CU level as the name '--set-qp-in-cu' implies
...
-Move slice delta QP to CU level when using --set-qp-in-cu
-Separate functionality from roi
2019-05-24 20:38:39 +03:00
Pauli Oikkonen
081d16fc33
Fix intrinsics that may be missing on some systems
...
Create a header to collect all the workarounds for missing intrinsics
in one place
2019-05-23 19:59:40 +03:00
Sami Ahovainio
5b46fbd878
Added multi_ref_idx variable for intra coding (is 0 throughout the code for now). Modified prediction flag writing. Chroma pred flag remains unchanged (ToDo). Added bitstream debug printing on VERBOSE mode.
2019-05-21 12:28:05 +03:00
Sami Ahovainio
ed4e218702
Updated coefficient coding to match VTM 5.0
2019-05-13 15:30:43 +03:00
Sami Ahovainio
504c3dfd1b
Modified the headers to match current VTM headers
2019-05-07 16:30:06 +03:00
Marko Viitanen
30a8a7b97c
WIP fixing the last significant xy coding
2019-05-07 15:01:02 +03:00
Pauli Oikkonen
87a9208db8
Eliminate cvtsi64_si128 intrinsic
...
Apparently it'll cause Win32 builds to break because it emits the movq
instruction or something..
2019-04-17 16:30:40 +03:00
Pauli Oikkonen
7175d20bb2
Still include stdint.h for non-vector builds
2019-04-15 19:36:01 +03:00
Pauli Oikkonen
1315c7e2b0
Do not compile any vector code for non-SSE4/AVX2 builds
2019-04-15 19:10:48 +03:00
Pauli Oikkonen
f5f70e7bc5
Merge branch 'sad-optimization'
2019-04-15 19:02:01 +03:00
Jan Beich
85f46e17a9
Detect AltiVec via elf_aux_info() on FreeBSD 12+
2019-04-01 13:08:04 +00:00
Jan Beich
82486255da
Simplify AltiVec detection on Linux
2019-04-01 13:08:04 +00:00
Marko Viitanen
1546acfdb9
New NAL unit IDs and header changes
2019-03-28 10:11:36 +02:00
Marko Viitanen
36eab9c170
New cabac context models with "rate"
2019-03-27 12:38:19 +02:00
Marko Viitanen
3bdc8ac8d3
Fix intra_chroma_pred_mode and cbf contexts
2019-03-26 09:10:09 +02:00
Marko Viitanen
d15f58517f
Changed intra coding to use 6 MPM, implemented merge sort and MPM selection
2019-03-20 15:20:31 +02:00
Marko Viitanen
1081336868
Updated intra pred mode init values
2019-03-20 15:18:32 +02:00
Marko Viitanen
f3acd245ae
New cabac coding function: kvz_cabac_encode_trunc_bin
2019-03-20 15:17:54 +02:00
Marko Viitanen
80d6e4bf05
New split flag calculations
2019-03-20 09:07:58 +02:00
Marko Viitanen
8c84348010
New entropy bit table
2019-03-20 09:07:22 +02:00
Marko Viitanen
2d0348aa6d
New context models
2019-03-20 09:06:57 +02:00
Marko Viitanen
052080747e
New CABAC functions
2019-03-20 09:06:26 +02:00
Marko Viitanen
20667fdba6
Update header bits to VTM 4.0+
2019-03-11 14:02:12 +02:00
Pauli Oikkonen
6d43759604
Create a border-respecting 32-wide AVX hor_sad
2019-03-07 18:01:22 +02:00
Pauli Oikkonen
f218cecb38
Remove offending hor_sad_avx2_w32 function
...
Consider possibly creating a non-offending AVX2 version instead, the
way hor_sad_sse41_w32 works. Or maybe there's more essential work to
do.
2019-03-05 22:51:41 +02:00
Pauli Oikkonen
df2e6c54fd
4-unroll hor_sad_sse41_arbitrary
...
This may not increase perf though because it's so rarely used
function, so keeping icache footprint may be more essential...
2019-03-05 22:45:23 +02:00
Pauli Oikkonen
448eacba7b
Avoid overreading block borders in hor_sad_sse41_arbitrary
2019-03-05 22:34:50 +02:00
Eemeli Kallio
c159e275b7
Merge branch 'max_merge'
2019-03-05 14:39:03 +02:00
Pauli Oikkonen
41f51c08c4
Avoid overrunning buffer in hor_sad_sse41_w32
2019-03-01 15:37:38 +02:00
Pauli Oikkonen
bcd9879359
Include quant coeff range check in non-scaling list execution path too
2019-02-27 17:26:44 +02:00
Pauli Oikkonen
24e6363f64
Remove the kvz_quant_avx2 wrapper function
2019-02-27 16:32:58 +02:00
Pauli Oikkonen
748820f3c5
Eliminate unnecessary loading of coeffs if scaling lists are off
2019-02-27 16:26:35 +02:00
Pauli Oikkonen
5994350f40
Allow quant_flat_avx2 to be used with scaling lists on
2019-02-27 16:25:59 +02:00
Eemeli Kallio
7f4e0acf41
Added check if max-merge is out of bounds
2019-02-19 13:53:42 +02:00
Pauli Oikkonen
9b0e079262
Use SSE instructions for 64-bit SADs instead of MMX
...
VC++ seems to choke on MMX instructions
2019-02-18 20:13:33 +02:00
Pauli Oikkonen
d8b8923028
Add LGPL notices to reg_sad headers
2019-02-18 17:52:47 +02:00
Eemeli Kallio
2a40560888
some variables to const
2019-02-12 11:24:10 +02:00
Eemeli Kallio
8f8e7bb53c
Added possibility to reduce number of maximum number of merge candidates.
2019-02-12 09:21:03 +02:00
Marko Viitanen
1165219842
Update PTL, SPS ext and SPS flags to match VTM 4rc1
2019-02-07 10:00:04 +02:00
Pauli Oikkonen
770db825b9
Create hor_sad_w8 and w4 epol mask the way w16 works
2019-02-06 19:34:26 +02:00
Pauli Oikkonen
aa19bcac8a
Avoid branching in creating shuffle mask in hor_sad_w16
2019-02-06 18:58:46 +02:00
Pauli Oikkonen
2d05ca8520
Remove width from constant-width hor_sad func params
...
They should kinda know it already
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
57db234d95
Move 32-wide SSE4.1 hor_sad to picture-sse41.c
...
It's not used by picture-avx2.c that also includes the header, so
it should not be in the header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
dd7d989a39
Implement 32-wide hor_sad on AVX2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ff70c8a5ec
Utilize horizontal SAD functions for SSE4.1 as well
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
f5ff4db01f
4-wide hor_sad border agnostic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
35e7f9a700
Fix hor_sad w8 to work with both borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
836783dd6e
Use hor_sad_w32 for both left and right borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
69687c8d24
Modify hor_sad_sse41_w16 to work over left and right borders
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
51c2abe99a
Modify image_interpolated_sad to use kvz_hor_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
1e0eb1af30
Add generic strategy for hor_sad'ing an non-split width block
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
686fb2c957
Unroll arbitrary-width SSE4.1 hor_sad by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
768203a2de
First version of arbitrary-width SSE4.1 hor_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ccf683b9b6
Start work on left and right border aware hor_sad
...
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
760bd0397d
Pad the image buffer by 64 bytes from both ends
...
This will be necessary for an efficient and straightforward
implementation of hor_sad for blocks over 16 pixels wide, because they
cannot use the shuffle trick because inter-lane shuffling is so hard to
do
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
c36482a11a
Fix bug in 24-wide SAD
...
*facepalm*
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
f781dc31f0
Create strategy for ver_sad
...
Easy to vectorize
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
ca94ae9529
Handle extrapolated blocks with unmodified width using optimized_sad pointer
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91b30c7064
Tidy up kvz_image_calc_sad
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
9db0a1bcda
Create get_optimized_sad func for SSE4.1
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91380729b1
Add generic get_optimized_sad implementation
...
NOTE: To force generic SAD implementation on devices supporting
vectorized variants, you now have to override both get_optimized_sad
and reg_sad to generic (only overriding get_optimized_sad on AVX2
hardware would just run all SAD blocks through reg_sad_avx2). Let's
see if there's a more sensible way to do it, but it's not trivial.
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
45f36645a6
Move choosing of tailored SAD function higher up the calling chain
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
91cb0fbd45
Create strategy for directly obtaining pointer to constant-width SAD function
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
94035be342
Unify unrolling naming conventions
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
517a4338f6
Unroll SSE SAD for 8-wide blocks to process 4 lines at once
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
0f665b28f6
Unroll arbitrary width SSE4.1 SAD by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
cbca3347b5
Unroll 64-wide AVX2 SAD by 2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
84cf771dea
Unroll 32 and 16 wide SAD vector implementations by 4
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
5df5c5f8a4
Cast all pointers to const types in vector SAD funcs
...
Also tidy up the pointer arithmetic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
a711ce3df5
Inline fixed width vectorized SAD functions
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
6504145cce
Remove 16-pixel wide AVX2 SAD implementation
...
At least on Skylake, it's noticeably slower than the very simple
version using SSE4.1
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
4cb371184b
Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
796568d9cc
Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
4d45d828fa
Use constant-width SSE4.1 SAD funcs for AVX2
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
2eaa7bc9d2
Move SSE4.1 SAD functions to separate header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
d2db0086e1
Create constant width SAD versions for 8 and 16 pixels
2019-02-04 20:41:40 +02:00
Pauli Oikkonen
a13fc51003
Include a blank AVX2 strategy registration function even in non-AVX2 builds
2019-02-04 19:52:24 +02:00
Pauli Oikkonen
d55414db66
Only build AVX2 coeff encoding when supported
...
..whoops
2019-02-04 19:34:30 +02:00
Pauli Oikkonen
3fe2f29456
Merge branch 'encode-coeffs-avx2'
2019-02-04 18:52:31 +02:00
Pauli Oikkonen
722b738888
Fix more naming issues
2019-02-04 16:05:43 +02:00
Pauli Oikkonen
e26d98fb75
Rename a couple variables and add crucial comments
2019-02-04 15:57:07 +02:00
Pauli Oikkonen
f186455619
Move encode_last_significant_xy out of strategy modules
...
It's the exact same in both AVX2 and generic, and does not seem to
be worth even trying to vectorize
2019-02-04 14:55:41 +02:00
Pauli Oikkonen
3f7340c932
Fine-tune pack_16x16b_to_16x2b
...
Avoid mm_set1 operation when it's possible to create the constant with
one bit-shift operation from another instead. Thanks Intel for
3-operand instruction encoding!
2019-02-04 14:44:47 +02:00
Pauli Oikkonen
314f5b0e1f
Rename 16x2b cmpgt function, comment it better, optimize it slightly
...
Eliminate an unnecessary bit masking to make it even more messy
2019-02-04 14:44:32 +02:00
Pauli Oikkonen
d8ff6a6459
Fix _andn_u32 to work on old Visual Studio
2019-02-01 15:34:42 +02:00
Pauli Oikkonen
26e1b2c783
Use (u)int32_t instead of (unsigned) int in reg_sad_sse41
2019-01-10 14:37:04 +02:00
Pauli Oikkonen
3a1f2eb752
Prefer SSE4.1 implementation of SAD over AVX2
...
It seems that the 128-bit wide version consistently outperforms the
256-bit one
2019-01-10 13:48:55 +02:00
Pauli Oikkonen
9b24d81c6a
Use SSE instead of AVX for small widths
...
Highly dubious if this will help performance at all
2019-01-07 20:12:13 +02:00
Pauli Oikkonen
b2176bf72a
Optimize SSE4.1 version of SAD
...
Make it use the same vblend trick as AVX2. Interestingly, on my test
setup this seems to be faster than the same code using 256-bit AVX
vectors.
2019-01-07 19:40:57 +02:00
Pauli Oikkonen
887d7700a8
Modify AVX2 SAD to mask data by byte granularity in AVX registers
...
Avoids using any SAD calculations narrower than 256 bits, and
simplifies the code. Also improves execution speed
2019-01-07 18:53:15 +02:00
Pauli Oikkonen
7585f79a71
AVX2-ize SAD calculation
...
Performance is no better than SSE though
2019-01-07 16:26:24 +02:00
Pauli Oikkonen
ab3dc58df6
Copy SAD SSE4.1 impl to AVX2
2019-01-03 18:31:57 +02:00
Pauli Oikkonen
45ac6e6d03
Tidy pack_16x16b_to_16x2b comments
2019-01-03 16:37:05 +02:00
Ari Lemmetti
cd818db724
Add missing quantization and residual in cost calculation (inter rd=2).
2018-12-21 15:55:29 +02:00
Pauli Oikkonen
016eb014ad
Move packing 16x16b -> 16x2b into separate function
2018-12-20 10:51:44 +02:00
Ari Lemmetti
b234897e8a
Fix smp and amp blocks in fme and revert previous change.
...
Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc.
Calculate SATD on the 8x4, ... part
2018-12-19 21:30:53 +02:00
Pauli Oikkonen
9aaa6f260d
Fixes to enable portability
2018-12-18 20:42:09 +02:00
Pauli Oikkonen
2fdbbe9730
Move CG reordering code from quant-avx2 to shared header
2018-12-18 19:42:18 +02:00
Pauli Oikkonen
d02207306d
Create a header file for shared AVX2 code
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
361bf0c7db
Precompute >=2 coeff encoding loop with 2-bit arithmetic
...
Who needs 16x16b vectors when you can do practically the same with
16x2b pseudovectors in 32-bit general purpose registers!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
940b0e9e6a
Require BMI2 for AVX2 build
...
Any processor implementing AVX2 should also implement BMI2
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
f66cb23d5b
Optimize greater1 encoding loop
...
Calculating the c1 variable need not be a serial operation!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
8c8b791c35
Vectorize kvz_context_get_sig_ctx_inc
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
033261eb74
Eliminate two branches using bit magic
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
c4434e8d04
Scan CG's in forward order to simplify finding last significant
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
efd097f5a5
Vectorize the coeff group loop to some extent
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
a01362e638
use the efficient method of reordering raster->scan
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
50a888e789
Use the efficient method to find first and last nz coeffs in block
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
7e9203f566
Scan coeff groups in scan order to help find last significant one
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
9a5a6fdbc7
Simplify two ifs in encode_coeff_nxn-avx2
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
37a2a8bac8
See if loop can be optimized by rearranging
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
584f2f74b6
Vectorize significant coeff group scanning loop
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
1bfed73221
Add AVX2 strategy for encode_coding_tree
2018-12-18 19:41:09 +02:00
Pauli Oikkonen
c3a6f3112a
Add generic strategy group for encode_coding_tree
2018-12-18 19:41:09 +02:00
Marko Viitanen
1ef851ab4b
Disable FME on amp/smp blocks with width or height not divisible by 8
2018-12-18 10:28:21 +02:00
Joose Sainio
b71c5573f0
Merge branch 'rate_control_fix'
2018-12-17 12:39:27 +02:00
Sergei Trofimovich
68a70e45a1
x86 asm: mark stack as non-executable
...
Gentoo's `scanelf` QA tool detects writable/executable stack
of assembly-writtent files as:
```
$ scanelf -qRa .
0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o
0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o
0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-sad.o
0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-satd.o
```
Normally C compiler emits non-executable stack marking (or GNU assembler
via `-Wa,--noexecstack`).
The change adds non-executable stack marking for yasm-based assmbly files.
https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details.
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2018-12-16 11:31:56 +00:00
Reima Hyvönen
1fcc5c6a8d
Merge branch 'bipred_recon'
2018-12-11 09:59:35 +02:00
Reima Hyvönen
e4a10880f3
Added case 12 to bipred_recon no mov
2018-12-11 09:52:17 +02:00
Marko Viitanen
a4f3968e52
Fix Visual Studio errors by initializing some variables used in AVX2 signhiding
2018-12-11 09:33:26 +02:00
Ari Lemmetti
ac943147e3
Calculate satd cost for whole non-square blocks as well.
2018-12-10 17:04:29 +02:00
Pauli Oikkonen
c465578048
Add a descriptive comment to coefficient reordering
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
f78bf2ebcb
Optimize q_coefs usage for indexed fetch
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
d9591f1b49
Eliminate midway buffering of reordered coefs
...
TODO: For some mysterious reason seems slightly slower than the
buffered one
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
7fe454c51f
Optimize get_cheapest_alternative()
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
6bbd3e5a44
Optimize rearrange_512 function
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
cb8209d1b3
Vectorize transform coefficient reordering loop
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
7cf4c7ae5f
Rename "reduce" functions to hsum
...
That's what the functions fundamendally do anyway
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
316cd8a846
Fix ALIGNED keyword and grow alignment to 64B
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
1befc69a4c
Implement sign bit hiding in AVX2
2018-12-03 15:36:32 +02:00
Pauli Oikkonen
c5cd03497e
Require BMI and ABM instruction sets for AVX2 build
...
AVX2 support on a processor should always imply BMI and ABM support.
The lzcnt and tzcnt instructions have more suitable semantics in the
corner case that source word is 0, and allow us to even handle that
scenario without a branch. Apparently Visual Studio will already
include this support when building with AVX2 enabled, so only the
automake files need to be tweaked.
2018-12-03 15:36:32 +02:00
Reima Hyvönen
f8696b54a4
Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)
2018-11-20 17:09:19 +02:00
Marko Viitanen
a5a10a33c3
Enable --scaling-list parameter and add to the documentation
2018-11-19 10:47:30 +02:00
Reima Hyvönen
710ba288db
Chroma has some problems
2018-11-15 16:42:48 +02:00
Sami Ahovainio
8f98d4aac7
Added square search
2018-11-14 14:50:31 +02:00
Marko Viitanen
6871490dd5
Simplify get_mvd_coding_cost(), only include golomb coding
2018-11-14 14:33:31 +02:00
Ari Lemmetti
a832206bb6
Replace 32-bit incompatible instrinsics
2018-11-12 18:54:33 +02:00
Ari Lemmetti
5c774c4105
Rewrite most of FME and interpolation filters
...
Changes had to break a lot of stuff and were just squashed into this horrible code dump
2018-11-08 20:21:16 +02:00
Joose Sainio
1c8a1f24e2
Don't assume anything about bits spent
2018-11-07 16:03:38 +02:00
Joose Sainio
3471e2470d
Fix using uninitialized value for the first frame
2018-11-07 08:17:39 +02:00
Joose Sainio
d95ac11a3b
Fix rate_control for other LP-GOPS
2018-11-06 14:20:44 +02:00
Joose Sainio
67a6ba667e
Fix rate control for flat lp-gop
2018-11-06 09:38:17 +02:00
Reima Hyvönen
7406c33a42
Some more cleaning
2018-10-26 12:25:18 +03:00
Reima Hyvönen
4c71546b2e
Cleaned some coding
2018-10-26 12:19:44 +03:00
Reima Hyvönen
4fe3909e48
Switched luma to use 32bits size ints intstead of 16bit size
2018-10-24 18:24:46 +03:00
Marko Viitanen
465bc2cfee
[EMT] make functions static and prefix arrays with kvz_g
2018-10-18 10:54:33 +03:00
Marko Viitanen
b133e7de1e
VTM 2.2 changed -> remove high_precision_motion_vectors flag
2018-10-17 12:41:14 +03:00
Marko Viitanen
169febd1c4
[EMT] Simplify DCT8, DCT5, DST1 and DST7 definitions
2018-10-17 12:17:54 +03:00
Marko Viitanen
e015d7eb2b
Fix compiler warnings
2018-10-17 10:43:11 +03:00
Marko Viitanen
ad310c77d3
Added EMT transforms to the strategies
2018-10-17 08:56:49 +03:00
Eemeli Kallio
284e73839e
Calculating zero cost moved to its own function
2018-10-16 11:02:01 +03:00
Reima Hyvönen
381e786e10
Trying to find the bug in luma
2018-10-11 18:08:41 +03:00
Marko Viitanen
c589e5ed36
Fix closed-gop frame feed, the ordering was incorrect after the first GOP
2018-10-10 11:12:03 +03:00
Reima Hyvönen
2f5f81bac3
removed the non-optimated bipred function
2018-10-09 11:19:23 +03:00
Marko Viitanen
75dce4f3ce
Fix low-delay-gop usage with --no-open-gop
2018-10-04 15:16:02 +03:00
Marko Viitanen
de71b58f76
Change closed GOP structure to include an additional IDR between GOPs
2018-10-04 11:17:03 +03:00
Marko Viitanen
1e1a80e4a6
[TMVP] fix clamping of block offsets and clean up the code a bit
2018-10-03 12:34:48 +03:00
Reima Hyvönen
212a8e68fa
Modified to avoid memory overflow, still some bug inside luma
2018-10-02 20:23:32 +03:00
Marko Viitanen
954f07e3d7
Add --(no-)open-gop option
2018-10-02 10:05:32 +03:00
Marko Viitanen
027359c3c3
Implement TMVP duplicate checking as in VTM 2.1
2018-09-28 11:50:36 +03:00
Marko Viitanen
571a545416
Fix spatial merge candidate selection
2018-09-26 15:10:31 +03:00
Marko Viitanen
63760ca0cf
Use kvz_cabac_bins_verbose flag to control cabac debug printing
2018-09-26 12:01:23 +03:00
Marko Viitanen
7c37f456f9
Fix implicit Qt split for p-frames
2018-09-26 12:00:18 +03:00
Marko Viitanen
b6f2c66c73
Fixed intra Most Probable Mode (mpm) derivation to conform VTM 2.1
2018-09-21 10:33:54 +03:00
Sami Ahovainio
a2b2275d87
Fixed array sizes in search_intra_rough from 35 to 67
2018-09-18 11:49:15 +03:00
Sami Ahovainio
82fb80ab6e
Fixed couple of if-clauses which still used the old intra mode range.
2018-09-17 08:56:43 +03:00
Marko Viitanen
a437d4c508
Fixed intra chroma mode bitstream writing (chroma search not used)
2018-09-13 15:05:00 +03:00
Marko Viitanen
389aeebe07
Added 2x2 transform functions
2018-09-13 14:51:07 +03:00
Marko Viitanen
445c059b4a
Fix transforms for VTM 2.0, generated new transform matrices and added a shift by 2 for forward and inverse
2018-09-13 14:39:49 +03:00
Marko Viitanen
35fa8e9785
Fix kvz_intra_get_dir_luma_predictor -> Intra working
2018-09-13 12:32:17 +03:00
Marko Viitanen
f75b0b11c3
Simplify intra filtered ref pixel selection
2018-09-13 10:09:52 +03:00
Sami Ahovainio
4bb484a86a
Fixed if-clause at search_intra.c to use new wider range of intra modes
2018-09-13 09:58:48 +03:00
Marko Viitanen
82de0fbee7
Switch intra search to use the actual 67 modes
2018-09-13 09:43:45 +03:00
Marko Viitanen
382917bcd3
New table for choosing angular intra filtered references and a small bugfix on the end condition of angular intra
2018-09-13 09:35:55 +03:00
Marko Viitanen
4aad2fa383
Fix intra mode writing
2018-09-12 10:34:58 +03:00
Marko Viitanen
d4ed0ee3ad
Fixed some array offsets in intra angular prediction
2018-09-12 08:53:17 +03:00
Marko Viitanen
20c96366ed
fix kvz_context_get_sig_ctx_idx_abs() parameter for "type" -> decoding with VVC
2018-09-10 12:51:02 +03:00
Marko Viitanen
a7ca09108c
Improve CABAC debugging by including similar info as in VTM
2018-09-10 11:00:00 +03:00
Sami Ahovainio
ce84407c69
Fixed coeff_remain writing to use the correct rice_param instead of using 0 all the time.
2018-09-07 11:24:24 +03:00
Sami Ahovainio
78ea24bcf1
Fixed sig_coeff_flag writing condition.
2018-09-06 15:48:45 +03:00
Marko Viitanen
4bebb4bb2c
Fix temp_diag and temp_sum initialization and coeff array usage in context derivation
2018-09-05 17:09:50 +03:00
Marko Viitanen
f5b6c386bc
Fix incorrect sig_flag implicity parameters and some temp variable initializations
2018-09-03 16:22:05 +03:00
Marko Viitanen
8bef85e056
Merge branch 'set-qp-in-cu'
2018-09-03 08:33:33 +03:00
Ari Lemmetti
2fdcc2b79d
Add option --set-qp-in-cu
2018-09-03 08:32:45 +03:00
Marko Viitanen
52be2f0bbe
Fixed kvz_encode_coeff_nxn and renamed some variables to match VTM
2018-08-31 15:10:17 +03:00
Sami Ahovainio
787264f568
Fixed dst indexing in kvz_angular_pred_generic
2018-08-31 10:36:28 +03:00
Sami Ahovainio
d2291fea83
Intra mode scaling moved from angular prediction to kvz_intra_predict. pdpc implemented in kvz_intra_predict.
2018-08-31 10:01:28 +03:00
Marko Viitanen
49a116ed3a
Bugfix correct array sizes for cu_ctx_last_x/y
2018-08-30 16:14:08 +03:00
Sami Ahovainio
84cef127dc
Fixed cu_gtx_flag_model_chroma initialization.
2018-08-30 15:21:16 +03:00
Marko Viitanen
7d491e639b
Add new values to last_x/y coding
2018-08-30 15:04:04 +03:00
Marko Viitanen
809805b185
Bugfixes for kvz_encode_coeff_nxn()
2018-08-30 14:50:29 +03:00
Marko Viitanen
0680f240d7
Converted kvz_encode_coeff_nxn and related helper functions to VVC K0072 format
2018-08-30 14:24:03 +03:00
Marko Viitanen
84e78c6c50
Disable writing of cabac flags not currently available
2018-08-30 11:21:44 +03:00
Marko Viitanen
e3dbaf99a9
Started implementing new coeff coding function
...
- added kvz_context_get_sig_ctx_idx_abs for abs sig context derivation
2018-08-30 11:09:42 +03:00
Marko Viitanen
e00319b832
Fix cu_sig_coeff_group_model init and some instances of cu_sig_model usage
2018-08-30 09:08:08 +03:00
Marko Viitanen
4429e0b89d
Expand cu_sig_coeff_group_model according to VVC
2018-08-29 16:20:34 +03:00
Sami Ahovainio
578122ed43
Context changes for chroma pred modes. BT flag init and chroma pred mode init moved inside a loop.
2018-08-29 16:00:08 +03:00
Sami Ahovainio
54ebadfc43
Clarifying comments and changes towards WAIP
2018-08-29 16:00:08 +03:00
Marko Viitanen
7f119e8bdd
Added new ctx models for sig, parity and gtx, removed models for one and abs
2018-08-29 15:57:40 +03:00
Marko Viitanen
46d02c1734
Implemented JVET-K0072 based cbf context selections
2018-08-29 10:12:07 +03:00
Marko Viitanen
bb9dc22336
Disable PCM
2018-08-29 09:59:53 +03:00
Marko Viitanen
23a1292f52
Added max_binary_tree_unit_size and more comments
2018-08-29 08:23:41 +03:00
Marko Viitanen
37caa451c6
Fix VVC split flag condition for hor and ver splits at the edges
...
- Split flag is no longer implicit when the block can be split with the BT after QT in horizontal or vertical way
2018-08-28 16:03:02 +03:00
Reima Hyvönen
896034b7cf
Some renamed functions back
2018-08-28 15:31:10 +03:00
Reima Hyvönen
e8b5e6db4c
Did some merging
2018-08-28 15:26:27 +03:00
Reima Hyvönen
7de5c74434
Updated bipred_recon to work faster
2018-08-28 15:12:31 +03:00
Reima Hyvönen
47b357cca2
Comment one test
2018-08-27 18:52:14 +03:00
Reima Hyvönen
2ca99a44e8
Updated shuffle operation to be in right order
2018-08-27 18:16:38 +03:00
Sami Ahovainio
42741a2c40
Some changes for PCM and Intra towards VTM 2.0 compatibility.
2018-08-27 09:18:15 +03:00
Marko Viitanen
3dc5f65fba
Add an extra bit to intra mode and map 33 angular modes to 65
2018-08-17 15:09:48 +03:00
Marko Viitanen
9aaf53fcd7
Add dep_quant_enable_flag to slice header
2018-08-17 14:58:57 +03:00
Marko Viitanen
dc92fa6fb3
Added missing ALF flag to SPS
2018-08-17 12:53:27 +03:00
Marko Viitanen
dbc74c592d
Add VTM 2.0 new flags to SPS
2018-08-17 12:47:29 +03:00
Marko Viitanen
17505c8306
Disable vertical and horizontal scan order with small blocks
...
- Intra now working down to 8x8 luma
2018-08-17 11:38:40 +03:00
Marko Viitanen
4f7da86285
Commented out sign hiding code, which is not used in VVC
2018-08-17 09:38:11 +03:00
Marko Viitanen
c9cbdd5dc3
Added couple of ToDo comments for large CTU support
2018-08-17 09:37:14 +03:00
Marko Viitanen
daf041406f
Disable DST
2018-08-16 16:05:32 +03:00
Marko Viitanen
b85ae3688e
Signal QP in slice header if tiles and slices=tiles are enabled
...
Keeps the PPS constant for various purposes
2018-08-16 08:44:39 +03:00
Sami Ahovainio
5baab86597
Added BT split flags
2018-08-14 15:28:06 +03:00
Marko Viitanen
b33aa37484
Enable max_trans_hier_depth values and disable DC and angular filtering
2018-08-14 15:24:21 +03:00
Marko Viitanen
00a827007a
Use normal split flags
2018-08-14 10:57:32 +03:00
Reima Hyvönen
508b218a12
some modifications made to prevent reading too much
2018-08-14 10:50:39 +03:00
Reima Hyvönen
1d935ee888
some useless stuff removed
2018-08-13 16:47:11 +03:00
Reima Hyvönen
ce3ac4c05e
some modifications to no_mov
2018-08-13 16:41:02 +03:00
Reima Hyvönen
15a613ae94
test if no_mov breaks testing
2018-08-13 16:02:56 +03:00
Reima Hyvönen
97a2049e58
removed pointer declaration out from switch
2018-08-10 16:42:26 +03:00
Reima Hyvönen
aa94bcedbc
Stream is now pointer
2018-08-10 16:38:49 +03:00
Reima Hyvönen
fa5b227ece
256 to 32 doesn't work, made them by hand
2018-08-10 16:01:20 +03:00
Reima Hyvönen
408dedbcc8
removed _mm256_extract_epi8 and replaced with _mm_stream
2018-08-10 15:53:26 +03:00
Reima Hyvönen
31c35091c6
_mm256_cvtsi256_si32 removed
2018-08-10 10:06:40 +03:00
Reima Hyvönen
99dc43074f
_mm256_cvtsi256_si32 breaks system, too much bits. back to extract
2018-08-10 09:59:33 +03:00
Reima Hyvönen
4f1f80b2cb
Transformed convert from 256 to cast 256 -> 128 and then convert from 128
2018-08-09 15:35:54 +03:00
Reima Hyvönen
4957555eb3
Removed leftover from 939
2018-08-09 15:25:03 +03:00
Reima Hyvönen
28b165c971
Clearified some sections, added _MM_SHUFFLE macro
2018-08-09 15:23:01 +03:00
Reima Hyvönen
dd04df8667
testing if error in both avx2 functions
2018-08-03 11:49:00 +03:00
Reima Hyvönen
ed50d71fde
Switched some variables to different location, altered inter_recon_bipred_avx2 function
2018-08-02 16:08:59 +03:00
Reima Hyvönen
f5739a0028
Renaming and removing useless prints
2018-08-02 14:47:17 +03:00
Reima Hyvönen
bc09f59bb6
Edited some definitions
2018-08-02 11:54:53 +03:00
Marko Viitanen
ffbc178cf9
An attempt to fix checksums
2018-07-27 14:38:05 +03:00
Marko Viitanen
84b6a61193
Hack to fix split flag model for PCM use -> valid VVC bitstream
2018-07-27 14:29:31 +03:00
Marko Viitanen
90174f1143
Add more values to cabac debugging
2018-07-27 13:59:54 +03:00
Marko Viitanen
c6572d644f
Updated split_flag initialization to support Large CTUs in VVC
2018-07-27 12:32:45 +03:00
Marko Viitanen
7abadaafe4
Disable CTU splitting and configure max CTU sizes to 64x64
2018-07-27 11:04:21 +03:00
Marko Viitanen
6921e31502
Fix debugging functions
2018-07-27 11:03:16 +03:00
Marko Viitanen
37b5ce3d33
Change configurations to ease VVC debugging, max-BT-depth = 0
2018-07-26 16:12:11 +03:00
Marko Viitanen
792da1b7e0
Force PCM coding and fix PCM sample output
2018-07-26 11:05:31 +03:00
Marko Viitanen
5d4a2a004f
Remove depentent slice, wpp/tile and scaling list parameters from PPS
2018-07-26 10:43:21 +03:00
Marko Viitanen
31a6cbfe6d
Disable sign bit hiding
2018-07-26 10:41:35 +03:00
Marko Viitanen
9f2b429c66
Disable some features not used in VVC
...
- Part mode coding not used
- split transform flag not used
- last significant coeff pos swapping not used
2018-07-26 10:33:27 +03:00
Marko Viitanen
e84276f7f6
Fixed version string
2018-07-26 08:17:55 +03:00
Marko Viitanen
e38109d102
Enable QTBT and set correct general_profile_idc for Next
2018-07-25 12:24:17 +03:00
Marko Viitanen
079ca9b8b2
Disable tile/wpp flags in slice header
2018-07-25 11:19:53 +03:00
Marko Viitanen
b0ac7002e5
Disable VPS
2018-07-25 11:02:09 +03:00
Marko Viitanen
c5bf6a3774
Bugfix: add missing parameters to WRITE_U
2018-07-25 10:18:48 +03:00
Marko Viitanen
9befe35961
Modify slice header to conform VVC
2018-07-25 10:17:42 +03:00
Marko Viitanen
95ce1e1a25
Modify parameter sets to conform VVC
2018-07-25 10:05:11 +03:00
Arttu Ylä-Outinen
83555c3d6d
Enable --fast-residual-cost with fastest presets
2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen
c438bb4a19
Add an option to skip CABAC for residual costs
...
Adds command line option --fast-residual-cost=<limit>. When QP is below
the limit, estimates the cost of coding the residual coefficients from
the sum of absolute coefficients. Skipping CABAC is not worth it with
high QPs because there are fewer coefficients so CABAC is not as slow.
2018-07-16 12:31:20 +03:00
Reima Hyvönen
a4bf77f208
Tested some extract functions
2018-07-12 09:29:32 +03:00
Reima Hyvönen
c05033a893
Even more useless vectors removed
2018-07-11 15:09:14 +03:00
Reima Hyvönen
884cb77238
Removed some not used vectors
2018-07-11 15:06:11 +03:00
Reima Hyvönen
792689a5ff
Removed for-loops, added extract instead
2018-07-11 14:56:41 +03:00
Reima Hyvönen
f9c7f6ee66
Added some break-operations for avx2 optimation
2018-07-11 14:15:38 +03:00
Reima Hyvönen
cc064da143
some more optimation for bipred
2018-07-11 11:27:54 +03:00
Reima Hyvönen
9a339eef89
Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD
...
# Conflicts:
# build/kvazaar_lib/kvazaar_lib.vcxproj
2018-07-10 16:21:04 +03:00
Reima Hyvönen
a22cf03ddb
Updated to have no movement function to avx2 strategies
2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen
b7474eb532
Fix SAO buffer sizes
...
Increases sizes of buffers used for SAO reconstruction to avoid stack
buffer overflow in AVX2 SAO reconstruction.
2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen
b37470e80f
Merge pull request #207 from jbeich/maltivec
...
Unbreak build on PowerPC if AltiVec isn't supported
2018-07-04 11:06:41 +03:00
Reima Hyvönen
ea83ae45f0
Toimiva ratkaisu
2018-07-03 11:18:51 +03:00
Jan Beich
4f4bea7496
Check -maltivec is supported before using
...
PowerPC target may lack or have non-standard FPU:
$ cc -dumpmachine
powerpcspe-undermydesk-freebsd
$ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c
src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist
2018-07-02 23:25:23 +00:00
Jan Beich
b892d820f8
Clean up macOS includes on powerpc* after 93e1c9f1c3
...
strategyselector.c:426:25: machine/cpu.h: No such file or directory
2018-07-02 21:52:45 +00:00
Reima Hyvönen
17babfffa4
25.6 working optimation, ~50% faster than original
2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen
2f995f4325
Merge pull request #205 from jbeich/powerpc
...
Unbreak build on non-Linux powerpc*
2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen
c1398ef818
Permit --period=1 with any GOP structure
...
All intra coding is a special case so it can be permitted even though
Kvazaar normally only supports intra periods that are divisible by the
GOP length.
2018-06-18 12:26:11 +03:00
Arttu Ylä-Outinen
abdebe0bf9
Fix --owf help message
...
The number of parallel frames is --owf plus one, not --owf minus one.
Fixes #204 .
2018-06-18 09:33:36 +03:00
Jan Beich
93e1c9f1c3
Add AltiVec detection for BSDs
...
strategyselector.c:377:26: linux/auxvec.h: No such file or directory
2018-06-17 15:38:24 +00:00
Miika Metsoila
98972d26c2
Document that the high tier requires level 4 or higher
2018-06-14 12:41:03 +03:00
Miika Metsoila
62b44efaa4
Write the encoding tier (main/high) into the bitstream
2018-06-14 12:41:03 +03:00
Arttu Ylä-Outinen
a343f6d587
Prepare for delta QPs at CU-level
...
- Replaces lcu_dqp_enabled with max_qp_delta_depth in encoder_control_t.
- Fixes set_cu_qps so that it can handle quantization groups of
arbitrary size.
- Fixes computation of QP predictors so that it works for quantization
groups of arbitrary size.
2018-06-13 15:36:19 +03:00
Arttu Ylä-Outinen
dc6b2024ea
Modify reference count asserts to fix data races
...
Changes asserts on the reference count of objects to assert the value
after KVZ_ATOMIC_INC instead of directly checking the value. Fixes some
data races detected by TSan.
2018-06-12 09:35:07 +03:00
Ari Lemmetti
4fb1c16c61
Add early termination for intra rdo when a zero coefficient block is found.
2018-06-08 21:03:07 +03:00
Ari Lemmetti
492529fb7a
Add the same comment to help message as well...
2018-05-30 14:13:15 +03:00
Ari Lemmetti
0d5972bf03
Add missing sort to intra transform split search so mode at 0 is the best
2018-05-21 13:10:38 +03:00
Sebastien Alaiwan
954bca7d6e
Fix memset parameter
2018-05-17 11:24:49 +02:00
Jaakko Laitinen
f9466efcbb
Close file on error
2018-05-15 11:50:16 +03:00
Reima Hyvönen
9fed29f950
optimation for inter_recon_bipred
2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen
5c585c4fbc
Update help message
...
Updates the default option values to match the medium preset.
2018-04-03 10:40:37 +03:00
Arttu Ylä-Outinen
2b4e22111a
Update presets
...
The new presets are slower but have better coding efficiency.
2018-04-03 10:37:30 +03:00
Arttu Ylä-Outinen
7185519a1b
Update command line help
...
- Adds missing default values.
- Adds help for --crypto and --key.
- Adds help for --rd=3.
- Adds help for --sao options.
- Some changes to help wording.
2018-03-23 14:33:04 +02:00
Arttu Ylä-Outinen
3606860504
Add --no-cpuid option
...
Equivalent to --cpuid=0.
2018-03-23 12:32:27 +02:00
Arttu Ylä-Outinen
fb462b25ef
Fix transform skip for inter
...
The transform skip flag in cu_info_t was stored under the intra
substruct even though transform skip can be used for inter as well. This
caused bitstream errors. Fixed by moving the flag out of the substruct.
2018-03-20 11:01:33 +02:00
Arttu Ylä-Outinen
b64e46707d
Skip raster scan step in TZ search
...
Raster scan is very slow and the BD-rate improvement is marginal.
2018-03-01 14:04:03 +02:00
Arttu Ylä-Outinen
6877064230
Add zero neighborhood check to TZ search
...
Adds an additional grid search step that starts from the zero motion
vector after the normal grid search. The search range for this step is
half of the normal range.
2018-03-01 14:02:13 +02:00
Arttu Ylä-Outinen
74a413c46a
Switch to star refinement in TZ search
2018-03-01 13:06:14 +02:00
Arttu Ylä-Outinen
ebee428ee1
Add loop termination to TZ grid search
...
Terminates the grid search if no better motion vector was found in the
last three iterations.
2018-03-01 13:06:06 +02:00
Arttu Ylä-Outinen
4c175621dd
Fix TZ grid search and star refinement
...
- Changes TZ grid search and star refinement to keep the origin constant
instead of moving to the best position after each iteration.
- Changes star refinement to loop until there is no more improvement,
instead of running the step only once.
2018-03-01 12:56:57 +02:00
Arttu Ylä-Outinen
9c2d0074a2
Add rounding of motion vectors in inter search
...
When the starting point for integer motion estimation was selected among
the merge candidates, the candidate motion vectors were always rounded
down. This commit changes the rounding so that they are rounded to the
nearest integer MV instead.
2018-03-01 09:39:21 +02:00
Ari Lemmetti
662430d441
Select CU type based on SSD, transform unit tree and mode cost of luma and chroma on --rd=2
2018-02-22 19:26:48 +02:00
Arttu Ylä-Outinen
cb06cfeadb
Drop temporary arrays in bipred search
...
Changes bipred search to use the original source and reconstruction
arrays directly instead of copying them.
2018-02-14 11:20:51 +02:00
Arttu Ylä-Outinen
0ea516ba30
Move bipred search to a separate function
2018-02-14 09:56:53 +02:00
Arttu Ylä-Outinen
6f506be12d
Drop dynamic allocation from bipred search
...
Moves the temporary LCU struct used in bipred search from the heap to
the stack. The single malloc call was a huge bottleneck in bipred.
2018-02-14 09:55:02 +02:00
Arttu Ylä-Outinen
7155dd0db7
Add negative references to L1 list
...
Changes reference index list creation so that the negative references
are added to L1 in addition to L0 when biprediction is enabled and no
reordering of pictures is done. Biprediction can now be used with the
low-delay GOP structure.
2018-02-07 14:54:52 +02:00
Arttu Ylä-Outinen
4b24cd03a2
Update for crypto++ 6.0.0 compatibility
...
Changes the crypto module to use unsigned char instead of byte. The byte
typedef is no longer included in the global namespace in crypto++ 6.0.0.
See https://github.com/weidai11/cryptopp/issues/442 .
Fixes #184 .
2018-02-05 13:35:03 +02:00
Arttu Ylä-Outinen
8c53417006
Check zero coefficient cost for inter
...
Checks the cost of flushing all coefficients of an inter block to zero.
This is much faster than doing full RDOQ but can still reduce bitrate
significantly. Encoding speed is increased since fewer coefficient bits
have to be coded with CABAC.
2018-01-29 12:41:56 +02:00
Arttu Ylä-Outinen
018b5ffa64
Move inter CU reconstruction to a new function
...
Moves code for reconstructing all PUs in an inter CU to a new function
kvz_inter_recon_cu in inter.c.
2018-01-24 15:05:39 +02:00
Arttu Ylä-Outinen
405b8c1069
Refactor inter MVD cost functions
...
Moves duplicate code for writing the MVD of a single motion vector from
kvz_get_mvd_coding_cost_cabac and encoder_inter_prediction_unit to a new
function.
2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen
c1cca1ad7f
Refactor inter MV candidate selection
...
Moves duplicate code for checking the best MV candidate from functions
calc_mvd_cost, search_pu_inter_ref and search_pu_inter to a new
function.
2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen
9067aa4535
Remove an unnecessary copy in SMP/AMP search
...
SMP/AMP search is performed using a lower work tree level than the
normal inter search so the prediction info must be copied up if an
SMP/AMP mode is chosen. Previously pixels and coefficient were copied as
well. Changed to only copy prediction info.
2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen
89a930d6dd
Add part mode bitcost when using SMP/AMP blocks
2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen
fc43643ba5
Use a transform split for SMP and AMP blocks
2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen
c74ede148b
Fix CBF flags for 4x4 luma blocks
...
CBF flags were not being propagated to the upper level from blocks of
size 4x4.
2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen
0a69e6d18f
Fix selection of transform function for 4x4 blocks
...
DST function was returned for inter luma transform blocks of size 4x4
even though they must use DCT. Fixed by checking the prediction mode of
the block in addition to whether it is chroma or luma.
2018-01-18 10:36:25 +02:00
Miika Metsoila
bcedfd6669
Remove the usage of errno in me-steps argument parsing
2018-01-16 14:38:43 +02:00
Miika Metsoila
39ed36830e
Merge branch 'me_steps'
2018-01-16 14:22:59 +02:00
Miika Metsoila
61213e3ad9
Improve step parameter parsing and usage
2018-01-10 15:16:52 +02:00
Arttu Ylä-Outinen
649113a821
Fix inter search being used for 4x4 blocks
...
When 4x4 intra blocks are enabled and inter search is limited to 16x16
and larger blocks, it is possible that inter search is accidentally done
for 4x4 blocks. Fixed by checking that block size is at least 8x8 before
doing inter search.
2018-01-10 14:21:48 +02:00
Miika Metsoila
e8e0e7596a
Add a step-cutoff parameter for motion estimation search
2017-12-22 14:04:25 +02:00
Miika Metsoila
4e13608b01
Merge branch 'diamond_search'
2017-12-18 14:11:53 +02:00
Miika Metsoila
2cde0d1a18
Document diamond search option
2017-12-12 14:45:01 +02:00
Miika Metsoila
b923b63b42
Add diamond search
2017-12-12 14:40:14 +02:00
Ari Lemmetti
14892fda00
Replace simple coefficient cost estimation with CABAC. Substantial improvement.
...
Approximation proved to be too inaccurate while not giving actually that much speedup.
2017-12-10 01:23:48 +02:00
Miika Metsoila
ea79069dc8
Fix a type warning in encmain.c
2017-12-08 16:22:40 +02:00
Miika Metsoila
6aa4cd7528
Fix type warnings
2017-12-08 16:16:36 +02:00
Miika Metsoila
b3486b5114
Fix gcc/clang warnings and errors in cfg.c
2017-12-08 16:09:00 +02:00
Miika Metsoila
bac07457ea
Merge branch 'hevc_level'
2017-12-08 15:57:38 +02:00
Miika Metsoila
c67a24e6ec
Update readme and --help text
2017-12-07 12:32:46 +02:00
Ari Lemmetti
713e694d82
Define HAVE_STRUCT_TIMESPEC on Visual Studio 2015 and later
...
Fixes redefinition of timespec that Pthreads-Win32 does even if it has been already defined.
2017-12-05 18:26:12 +02:00
Miika Metsoila
f64d42169f
Improve bitrate checking to accommodate non-integer and less than 1 framerates
2017-12-01 17:20:12 +02:00
Miika Metsoila
57cf92d35f
Implement level's bitrate limit checking during encoding
2017-11-28 16:19:44 +02:00