Commit graph

3739 commits

Author SHA1 Message Date
Arttu Makinen df375a055e Small changes with VTM version 11.0. 2020-12-30 16:26:59 +02:00
Arttu Makinen 7109313161 Added forgotten memory release. 2020-12-30 16:26:50 +02:00
Arttu Makinen b17e26511f Removed/moved the last global variables from ALF. 2020-12-30 16:26:49 +02:00
Arttu Makinen f5556a5d69 Moved cabac_estimator from globals to alf_info_t. 2020-12-30 16:26:30 +02:00
Arttu Makinen ffdca81dca ALF frame buffer moved. 2020-12-30 16:26:22 +02:00
Arttu Makinen a3998450d0 Most of the remaining globals removed/moved. 2020-12-30 16:26:14 +02:00
Arttu Makinen 35233d2e17 Multiple global arrays placed in a struct of arrays.
Also g_ctb_distortion_unfilter and g_aps_id_start removed.
2020-12-30 16:25:54 +02:00
Arttu Makinen aed4d29c79 Continuation of removal/moving of ALF globals.
Removed/moved globals: g_ctu_enable_flag, g_ctu_alternative, g_ctu_enable_flag_tmp, g_ctu_alternative_tmp.
2020-12-30 16:25:40 +02:00
Arttu Makinen 335ce2bdda Moving ALF globals to alf_info struct inserted in videoframe_t.
g_alf_covariance and g_alf_covariance_frame moved.
2020-12-30 16:25:18 +02:00
Arttu Makinen 76cf8a16d9 Fixed couple of memory problem bugs. 2020-12-30 16:25:01 +02:00
Arttu Makinen 0914864300 Bug fix for reading alf type to cfg. 2020-12-30 16:24:59 +02:00
Arttu Makinen 9d56d6444d Removed filter shape/type from variables and functions.
Filter shape/type size was only used and was always defined as 1.
2020-12-30 16:24:50 +02:00
Arttu Makinen 218d5b51d3 Cleaning ALF code. 2020-12-30 16:24:24 +02:00
Arttu Makinen 420ee4cc21 Changed alf_enabled and alf_cc_enabled flags into one alf_type enum as in sao. 2020-12-30 16:23:56 +02:00
Arttu Makinen 2b62b91589 Added CC ALF parameter for encoding. 2020-12-30 16:22:02 +02:00
Arttu Makinen 0e74bfb2a8 CC ALF now works properly. 2020-12-30 16:22:01 +02:00
Arttu Makinen fc39b311bd Added fixing of pixels outside of the actual frame before CC ALF. 2020-12-30 16:22:01 +02:00
Arttu Makinen 99745c2e5a Added writing of CC ALF flag. Couple of bug fixes. 2020-12-30 16:22:00 +02:00
Arttu Makinen 1471448218 Bug fixes in derive_cc_alf_filter and get_blk_stats_cc_alf. 2020-12-30 16:22:00 +02:00
Arttu Makinen f7fe8d9a27 Added more CC ALF functions.
Currently not working.
2020-12-30 16:21:59 +02:00
Arttu Makinen 9ed5169919 Finished functions get_blk_stats_cc_alf and calc_covariance_cc_alf for CC ALF. 2020-12-30 16:21:29 +02:00
Arttu Makinen bf8bb62e50 Got rid of fair amount of global variables. 2020-12-30 16:21:28 +02:00
Arttu Makinen 7846796a4e Removed #define FULL_FRAME. 2020-12-30 16:20:25 +02:00
Arttu Makinen 7bfb1ca6b4 Removal of useless comments. 2020-12-30 16:19:57 +02:00
Arttu Makinen 529bdb4dd2 Modify APS header writing. 2020-12-30 16:19:47 +02:00
Arttu Makinen ee70bcfaec Fixing warnings. 2020-12-30 16:19:07 +02:00
Arttu Makinen d7eafc391f Fixing uninitialized parameters. 2020-12-30 16:18:24 +02:00
Arttu Makinen 36ffdcaf3f Disable output of debug stats. 2020-12-30 16:18:09 +02:00
Arttu Makinen 98768061db Adding CC ALF. 2020-12-30 16:18:08 +02:00
Arttu Makinen da04fffaec Updated the creating of ALF parameters and init for them. 2020-12-30 16:17:54 +02:00
Arttu Makinen bfa77e35c3 Fixed a bug where reconstruction for ALF was called multiple times for no reason.
Modified reconstruction of pixels after ALF search.
2020-12-30 16:17:43 +02:00
Arttu Makinen bd292dab16 Fixed coding of headers for inter coding with ALF. 2020-12-30 16:15:12 +02:00
Arttu Makinen 26dc5b8c4e Multiple APSs can now be signaled.
Can't test usage of multiple APSs properly because inter coding doesn't work.
2020-12-30 16:13:56 +02:00
Arttu Makinen 4ffb0b71a6 Chroma filtering works.
Also some code cleaning.
2020-12-30 16:13:25 +02:00
Arttu Makinen a95fd73668 At least one APS can be signaled.
Problem with APS was in encoder_state-bitstream.c.
Cleaning of code.
2020-12-30 16:12:56 +02:00
Arttu Makinen d7126520b2 Moving param_set_map from slices to cfg.
Bug fix in kvz_alf_encoder_ctb.
2020-12-30 16:12:38 +02:00
Arttu Makinen c55a2a04e8 Bug fix in kvz_alf_encoder.
New bugs appeared with this fix.
2020-12-30 16:12:17 +02:00
Arttu Makinen 8aa91f320a Bug fixes and cleaning. 2020-12-30 16:11:36 +02:00
Arttu Makinen bfba8d43cb Working on to get APS working for ALF. 2020-12-30 16:10:01 +02:00
Arttu Makinen b3ecc755e2 ALF search is now executed for full frame. Works for only 1 frame.
Checksum matches.
APSs are not used currently.
#define FULL_FRAME in alf.h is set to 1 in order to use ALF for full frame.
#define FULL_FRAME 0 produces working bitstream but checksum doesn't match.
2020-12-30 16:08:46 +02:00
Arttu Makinen 94787acb73 Divided encoder_state_worker_encode_lcu -function in encoderstate.c into encoder_state_worker_encode_lcu_search and encoder_state_worker_encode_lcu_bitstream.
ALF off. No changes in bitstream.
2020-12-30 16:07:46 +02:00
Arttu Mäkinen ec62ed89cb LCUs now have mismatched only on boundaries.
Fixed a bug in alf.c line 5451.
Modifications to copying the boundary pixels of CTU.
2020-12-30 16:07:45 +02:00
Arttu Mäkinen f202aa43fa WIP Updating VTM8.2 to VTM10.0.
Small update to ALF cabac flags.
Minor variable definition updates.
2020-12-30 16:07:44 +02:00
Arttu Mäkinen bc90b731a5 ALF updated to VTM8.2. Checksum doesn't match.
ALF uses currently only ready defined coefficients, not APSs.
Produces a valid bitstream, but checksum doesn't match.
CC ALF is disabled.
2020-12-30 16:06:59 +02:00
Arttu Mäkinen 2f80216514 Some cleaning and updating.
Set to use only existing filters rather than signal APS.
2020-12-30 16:02:01 +02:00
Arttu Mäkinen a430d48669 ALF works now with VTM7.0 as in VTM6.1.
VTM properly decodes bitstream from kvazaar but the checksum doesn't match.
Couple hard coded values needed for this in function "kvz_encode_alf_bits".
2020-12-30 15:59:08 +02:00
Arttu Mäkinen 7250f4549b Merge fixes. 2020-12-30 15:12:32 +02:00
Arttu Mäkinen 21a4751875 Works with VTM decoder with one frame with one hard coded value.
APS NAL unit type writing added.
Bug fixes.
WIP.
2020-12-30 15:11:17 +02:00
Arttu Mäkinen 9cad95c94c Bug fixes.
WIP.
2020-12-30 15:09:13 +02:00
Arttu Mäkinen 09c68d9de6 Outputs valid frame with kvazaar. Still problems with cabac when decoding with VTM.
Decided to use buffers that were added in last commit.
Some small fixes and adjustments.
WIP.
2020-12-30 15:09:12 +02:00
Arttu Mäkinen 2cac901cca Testing different kind of buffer for alf image fulldata.
WIP
2020-12-30 15:09:12 +02:00
Arttu Mäkinen feb201986a Changed to process one CTU at a time rather than all CTUs.
WIP
2020-12-30 15:09:11 +02:00
Arttu Mäkinen b04bb66160 Adjustments and cleaning.
WIP
2020-12-30 15:09:10 +02:00
Arttu Mäkinen c76c445142 Cabac/ctx calculation added.
Bug fixing and adjusting.
WIP
2020-12-30 14:32:01 +02:00
Arttu Makinen ade4fc4061 Update of contexts of ALF.
WIP
2020-12-30 14:32:00 +02:00
Arttu Makinen ebb99a7223 Changed 'width's to 'stride's, because added more pixels to 'fulldata'.
Also some small fixes and changes.
Checksum correct in luma.
WIP
2020-12-30 14:30:47 +02:00
Arttu Makinen 377aa989ab Updated to VTM6.1.
Done according to all #ifs enabled
2020-12-30 14:27:15 +02:00
Arttu Makinen 0fbbf1a7e2 Small fixes/adjustments 2020-12-30 14:25:58 +02:00
Arttu Makinen 98a8e78e93 avx2/encode_coding_tree-avx2.c update, because it caused errors 2020-12-30 14:25:16 +02:00
Arttu Makinen ed76650fa5 Updating to VTM6.0 2020-12-30 14:25:09 +02:00
Arttu Makinen a24f49c286 Doesn't crash anymore during debug. Added new allocator for fulldata in kvz_picture. 2020-12-30 14:24:16 +02:00
Arttu Makinen 2b7a8af23a Crashes now in kvz_image_free. 2020-12-30 14:22:38 +02:00
Arttu Makinen 05495bb555 Not working. All the functions done.
Heap corruption occur during debugging.
2020-12-30 14:22:30 +02:00
Arttu Mäkinen 236224dbb9 Broken version with header mismatch 2020-12-30 14:07:34 +02:00
Arttu Mäkinen 06233b5d3b added alf parameter to cli 2020-12-30 14:02:58 +02:00
Jaakko Laitinen 71751c3770 Fix max filter size derivation 2020-12-29 17:57:35 +02:00
Jaakko Laitinen 6a8d73252a Fix runtime errors 2020-12-28 16:41:00 +02:00
Jaakko Laitinen 85be89a85c Fix compilation errors 2020-12-28 15:20:30 +02:00
Jaakko Laitinen 95ff22f0db Finish max filter length fixes 2020-12-28 14:26:36 +02:00
Jaakko Laitinen 13e605153a Fix bugs 2020-12-22 19:11:47 +02:00
Jaakko Laitinen 50e9acd3f4 Add max filter length derivation 2020-12-21 18:47:02 +02:00
Arttu Makinen bc8507cc8d MTS context. 2020-12-18 18:35:11 +02:00
Arttu Makinen fd2f73b460 MTS headers and commands. 2020-12-18 17:40:47 +02:00
Jaakko Laitinen 7a71b700fb Add chroma deblock filtering 2020-12-18 11:06:41 +02:00
Marko Viitanen 0c5e1db0fa Fix wpp chroma bug 2020-12-15 22:59:22 +02:00
Marko Viitanen 071fe7fd51 Limit the top-right intra references when wpp is turned on
Chroma hash still fails.
2020-12-15 22:33:32 +02:00
Marko Viitanen 6146610ec8 Fix the wpp sync point to be the first LCU 2020-12-15 14:51:46 +02:00
Jaakko Laitinen 78be0ccd05 Fix chroma deblocking logic 2020-12-15 14:10:09 +02:00
Marko Viitanen c07a56179f Fix Hash SEI message for VTM11.0 2020-12-15 13:47:28 +02:00
Arttu Makinen 30c4065dc0 Headers for threading. 2020-12-15 13:04:39 +02:00
Jaakko Laitinen 6128db961a Finish up large block filtering 2020-12-11 19:34:56 +02:00
Jaakko Laitinen 976d1c8812 Start implementing large block filtering 2020-12-10 18:03:18 +02:00
Jaakko Laitinen 33cea17484 Add logic for large block filtering 2020-12-09 19:10:38 +02:00
Jaakko Laitinen d3d55933b2 Finish up strong filtering condition check 2020-12-08 18:38:05 +02:00
siivonek e833354cdd Merge branch 10-bit-assert-fix 2020-12-07 20:36:50 +02:00
Jaakko Laitinen 5a90deb678 Add initial max filter length and large block stuff 2020-12-07 18:54:43 +02:00
Jaakko Laitinen 03dade8246 Prepare for large blocks 2020-12-04 18:31:48 +02:00
Jaakko Laitinen 7b0b864947 Fix mvd thresholds and tc/beta index calculations 2020-12-04 15:54:40 +02:00
Jaakko Laitinen 8f3de705eb Add todo list of things to check 2020-12-01 13:53:52 +02:00
Pauli Oikkonen be19fd996b Add default value for fast coeff table filename
..oops
2020-11-02 14:02:51 +02:00
Pauli Oikkonen 46301e9857 Document the --fast-coeff-table option 2020-10-29 15:23:26 +02:00
Pauli Oikkonen 816789c9f4 Allow fast coeff weights to be read from a file 2020-10-29 15:22:51 +02:00
Pauli Oikkonen 6799019db0 Move fast coeff table to transform.h
Guess this is a more logical place for it
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 4712ce5f59 Round the fast coeff result instead of flooring 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 0fb09c9920 New filtered coeff weight by QP values 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 9bf0cb27b1 Constrain fast cost estimation to QPs we have weights for 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 24d487f553 New weights for 12 <= QP <= 42
Trained using MSU ultrafast settings now
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 3e1c6d84b8 Fix issues in fast coeff estimation
Allow weight table to start from nonzero QP, and round weights to Q8.8
instead of flooring them
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 5f91bda762 Use newer data for fast coeff cost estimation
Same training dataset, but this time only buckets 0...3 were used to
approximate the function, no sign/cg width bucket.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen 2abd733199 Use unsigned min() to correctly clip -32768
If a coeff happens to be -32768 (0x8000), its 16-bit abs() is also
0x8000. It should ultimately be clipped to 3, so interpret absolute
values as unsigned instead to make that happen.
2020-10-29 15:20:27 +02:00
Pauli Oikkonen b93b90c0d7 Implement new fast coeff cost estimator in AVX2 2020-10-29 15:20:27 +02:00
Pauli Oikkonen 2f74a112b3 Try first lookup table based fast coeff estimation 2020-10-29 15:20:27 +02:00
Marko Viitanen 2db3a07b14 Prevent cu_sig_model_chroma array from being indexed over the limit 2020-10-13 14:14:57 +03:00
Marko Viitanen f4948dda6f Fix array size for bdpcm_mode[] 2020-10-13 12:51:20 +03:00
Marko Viitanen 9e3e8f51f6 Change kvz_g_tc_table_8x8 from uint8_t to uint16_t to fit all the values 2020-10-13 12:05:27 +03:00
Marko Viitanen 26f4f45c6d Use correct pred_mode cabac models -> fixes inter cabac bits 2020-10-13 12:04:31 +03:00
Marko Viitanen 5a6806cbf7 [CI] Limit testing parameters to those that work 2020-10-09 09:37:15 +03:00
Marko Viitanen 3c7eb55292 Disable output of cabac debug when in "count only" mode
- Some code cleanup
2020-10-09 08:45:43 +03:00
Marko Viitanen fa25621c77 Force certain intra modes off 2020-10-09 08:44:40 +03:00
Marko Viitanen 54b8fd054d Fix Chroma QP scaling issue 2020-10-02 15:40:23 +03:00
Marko Viitanen 11229997b6 Fix NAL header layer_id 2020-10-01 11:10:40 +03:00
siivonek bc1206a4d3 Define qp_delta_min & max in global.h instead of calculating them locally. 2020-09-29 13:46:27 +02:00
Marko Viitanen ac2032eb65 Fixing P/B frame headers and debug output formatting 2020-09-28 14:58:07 +03:00
Marko Viitanen bddfb47a55 Merge remote-tracking branch 'remotes/kvazaar_github/master' 2020-09-25 11:49:11 +03:00
Marko Viitanen 551a3991cf Cleanup headers 2020-09-24 09:31:44 +03:00
siivonek 0f3ef786b9 Modify delta QP range assert so it will work with any valid bit depth. Modify VAQ code so it will clip the QP to a proper range which is dependent on bit depth 2020-09-22 20:15:23 +02:00
siivonek fe6f93a951 Fix delta QP range check assert. Add separate asserts based on bit depth. 2020-09-22 20:15:22 +02:00
Marko Viitanen 449975b0fb Fixed cubic filter usage in intra angular modes 2020-09-21 14:58:34 +03:00
Joose Sainio 8143ab971c Merge branch 'stats-files'
# Conflicts:
#	src/cfg.c
#	src/cli.c
#	src/kvazaar.h
2020-09-16 09:25:00 +03:00
Joose Sainio 1c06bd7f3d Fix POC to be correct for all GOPs and Intra periods, fix issue with vaq 2020-09-14 14:25:48 +03:00
Sami Ahovainio 4d87fb2397 fixed potential out of bounds iteration 2020-09-10 12:59:39 +03:00
Sami Ahovainio 5d521a2444 Added option to force yuv as file format and made the options and file endings case insensitive 2020-09-09 16:05:59 +03:00
Joose Sainio 3fb8b7ebc6 Add --stats-file-prefix option
When the option is defined with an option four files prefixlambda.txt,
prefixqp.txt, prefixdist.txt, and prefixbits.txt that have the corresponding
data for each ctu. This is a debug feature.
2020-09-09 12:35:47 +03:00
Sami Ahovainio 84cabd9c20 Fixed sign match 2020-09-07 15:39:31 +03:00
Sami Ahovainio d691849594 Added frame header reading for both read and seek functions 2020-09-07 15:31:08 +03:00
Sami Ahovainio cbcee67821 y4m start header parsing ready 2020-09-07 15:31:07 +03:00
Joose Sainio c10b841e7c Merge remote-tracking branch 'remotes/origin/fix-sao-parameter' into master 2020-09-07 13:10:36 +03:00
Joose Sainio da09d49890 Remove optionality from --sao
SAO parameter was optional which caused that if one wants to pass argument
one needs to use "=" which is confusing since this is not required for any
other parameter
2020-09-07 12:35:40 +03:00
Pauli Oikkonen 3f7f0d7ed7 Allow bit depth to be defined from the outside
For a 10-bit build, just use:
env CFLAGS="-DKVZ_BIT_DEPTH=10" ./configure && make clean && make
2020-09-02 17:55:22 +03:00
Pauli Oikkonen 780da4568a Exclude 8-bit-only code from 10-bit builds and use uint8_t instead of kvz_pixel for code that assumes 8-bit pixels 2020-09-02 17:46:33 +03:00
Pauli Oikkonen 31ef4e4216 Fix ml functions to accept kvz_pixel*, not uint8_t* 2020-09-02 17:46:33 +03:00
Marko Viitanen 574c4d06ee Fix use of log2_cg_size in coeff coding -> smaller blocks also decoded correctly 2020-08-27 18:26:16 +03:00
Marko Viitanen b3f3a9eae6 Add two EOS NAL units at the end of each picture to make intra sequence work 2020-08-25 15:30:21 +03:00
Marko Viitanen b7638172ca Use continuous POC for all intra and add aud_irap_or_gdr_au_flag 2020-08-25 11:53:55 +03:00
Marko Viitanen b53b53ed09 Fixed SAO headers, SAO produces valid output 2020-08-20 15:37:29 +03:00
Marko Viitanen b4907e6337 Fix deblocking headers and some cleanup, deblocking does not produce valid output 2020-08-20 15:25:18 +03:00
Arttu Mäkinen 4da90b3722 Update of contexts. 2020-08-17 18:18:35 +03:00
Arttu Mäkinen 232332dc5f Update of contexts. 2020-08-17 14:23:26 +03:00
Marko Viitanen 2fc8558926 Set correct profile, level and inter flags in IDR 2020-08-17 11:51:57 +03:00
Marko Viitanen 0f8ada02c4 Fix VPS writing 2020-08-17 11:26:09 +03:00
Arttu Mäkinen da9f542209 WIP updating VTM8.2 to VTM10.0rc 2020-08-17 10:27:03 +03:00
Joose Sainio faf5cc858d Merge branch 'fix-lp-gop-rc' 2020-06-25 09:41:57 +03:00
Joose Sainio 138651ee85 Fix the bit and frame counts for calculating the gop allocation
Additionally dynamically adjust the smoothing window if there are rapid changes
2020-06-24 15:26:54 +03:00
Ari Lemmetti f8ff6dd567
Merge pull request #262 from jbeich/truncate-freebsd
Unbreak build on FreeBSD
2020-06-22 18:08:01 +03:00
Ari Lemmetti d1abf85229 Add MV constraint check to motion estimation start point 2020-06-01 23:51:38 +03:00
Marko Viitanen 20b66c9949 Sync to VTM 8.2 and add separate height to last_sig coding 2020-04-29 08:52:38 +03:00
Jan Beich 1fa69c705d Rename truncate() from 30ce461d98 to avoid conflict with POSIX version
strategies/avx2/dct-avx2.c:55:23: error: static declaration of 'truncate' follows non-static declaration
static INLINE __m256i truncate(__m256i v, __m256i debias, int32_t shift)
                      ^
/usr/include/stdio.h:448:6: note: previous declaration is here
int      truncate(const char *, __off_t);
         ^
2020-04-22 16:09:42 +00:00
Ari Lemmetti 9753820b3a Update version to 2.0.0 2020-04-22 01:03:36 +03:00
Ari Lemmetti 40e81f3243 Update preset tables. Update docs. 2020-04-22 01:03:21 +03:00
siivonek 54f438a75c Update VAQ help text. Update docs. Change some lingering tabs to spaces. 2020-04-20 16:52:07 +02:00
Marko Viitanen 86d76b19a4 Fix intra neighboring block selection and clean some unused code 2020-04-16 14:12:40 +03:00
Marko Viitanen 27b4dd50f8 Fix picture header to code Inter frame 2020-04-14 08:24:11 +03:00
Ari Lemmetti f31dddc019 Bypass inverse quantization and inverse transform when trying early skip 2020-04-10 16:02:09 +03:00
Pauli Oikkonen fbdb1e2d15 Add correct path to sao_shared_generics.h in makefile 2020-04-08 19:27:12 +03:00
Pauli Oikkonen 8617530b13 Use _mm_store_epi64 instead of _mm_cvtsi128_si64
Fix 32-bit builds that tend to lack the cvt intrinsic. Hope it will be
optimized to a movq r64, xmm on modern platforms though
2020-04-07 23:51:54 +03:00
Pauli Oikkonen a82966c0f5 Fix lacking _mm256_cvtss_f32 intrinsic on VS
Cast __m256 into __m128 first, the XMM variant of the intrinsic has been
around for a long enough time to be supported
2020-04-07 22:38:10 +03:00
Marko Viitanen 27ffba2c9c Fix terminating bit condition at the end of the slice 2020-04-07 15:30:02 +03:00
Marko Viitanen e737a878a6 Fix split flags and remove an extra terminating bit 2020-04-07 09:57:30 +03:00
Joose Sainio c369ff8873 Fix a potential division by zero in a floating point operation
When C is calculated with K if the value of K is not clipped before in some
cases it is possible that K gets such a large negative value that bpp^K is
rounded to zero. In real-life cases this is extremely rare and clipping
beforhand has very little to no effect.

Also remove commented debug prints
2020-04-06 11:05:49 +03:00
Ari Lemmetti 901c25c0c8 Merge branch 'vaq' 2020-04-03 19:51:17 +03:00
Ari Lemmetti 51451be5ef Handle cases where the number of pixels is not divisible by 32 2020-04-03 19:37:47 +03:00
siivonek ee544304f1 Make function static to not mess up tests. 2020-04-03 15:22:34 +02:00
siivonek e5267f7706 Fix define for use with Visual Studio. 2020-04-03 15:11:01 +02:00
siivonek 9e34369304 Merge branch 'vaq' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into vaq 2020-04-03 12:35:04 +02:00
siivonek d025977949 Clamp edge lcu pixels if dimensions are not 64 divisible. 2020-04-03 12:33:14 +02:00
Pauli Oikkonen addc1c3ede Fix warning about potentially unused hsum_8x32b
There's a lot of alternative options available, such as making it
globally visible with a kvz_ prefix, force inlining it, or anything.
This could be good too, hope it won't be compiled at all to translation
units where it's not used.
2020-04-02 16:44:22 +03:00
siivonek e3ba0bfb8c Fix memory leak. 2020-04-02 14:15:36 +02:00
siivonek 566680af7b Move function hsum to file where it is used to avoid errors. 2020-04-02 14:03:06 +02:00
siivonek 58be514e2a Fix pipeline error. 2020-04-02 13:50:08 +02:00
siivonek 2aa0d97589 Add VAQ test in test_tools. Bump minor version number in configure.ac. Update help text for VAQ. 2020-04-01 18:16:39 +02:00
siivonek c6e421019e Merge vaq-simd 2020-03-31 21:40:29 +02:00
Jaakko Laitinen 8e4b738900 Fix error when first value in pu depth list is omitted 2020-03-31 16:57:12 +03:00
Jaakko Laitinen 54ef0bbfd2 Fix unintended functionality when giving multiple --pu-depth-intra/inter list parameters 2020-03-31 16:39:56 +03:00
Jaakko Laitinen cb0c7b23b5 Merge branch 'intra_qp_offset_auto' into 'master'
Add auto option to intra-qp-offset

See merge request TIE/ultravideo/kvazaar!7
2020-03-31 16:17:36 +03:00
Pauli Oikkonen 99889dab15 Fix switch(bool) in picture-avx2.c
It passes on GCC but warns on Clang
2020-03-31 15:42:19 +03:00
Jaakko Laitinen e0440c3de1 Update docs 2020-03-31 15:27:48 +03:00
Jaakko Laitinen 7760dcf441 Remove intra qp offset from preset parameters 2020-03-31 14:06:07 +03:00
Jaakko Laitinen 8bd1a2b667 Update help message 2020-03-31 13:19:05 +03:00
Jaakko Laitinen b4f5486190 Set intra qp offset default to auto 2020-03-31 12:58:40 +03:00
Jaakko Laitinen 740688c67d Add auto option to intra qp offset 2020-03-31 11:56:44 +03:00
Marko Viitanen a0af87bdc0 Update contexts to match VTM 8.0 2020-03-30 14:34:50 +03:00
Marko Viitanen d36ba85861 Fixed PPS and slice header to match VTM 8.0 (only for I-Frame!) 2020-03-30 12:55:12 +03:00
Marko Viitanen 64b9177cf0 Fix SPS to match VTM 8.0 2020-03-30 09:56:38 +03:00
Pauli Oikkonen 0c7bfa7dc9 Fix AVX2 on Clang
Besides just -mavx2, AVX2 support depends on a couple minor instruction
set extensions that should always exist on AVX2-capable hardware. Too
bad the different bit twiddling instructions are invoked slightly
differently between GCC and Clang, but now Clang seems to also produce
an AVX2-capable build.
2020-03-26 18:48:48 +02:00
siivonek 89d3e674ce Comment out code which possible messes up OBA 2020-03-26 17:49:31 +02:00
siivonek be7d9ddec5 Fix error in frame variance calculation. Chroma channels were not added to variance 2020-03-26 14:33:00 +02:00
Marko Viitanen 8908324df8 Fix PTL DPB HDR param headers to match VTM 8.0 2020-03-26 10:40:27 +02:00
Marko Viitanen d622ebb1f4 Fix NAL types to match VTM 8.0 2020-03-26 10:39:35 +02:00
Jaakko Laitinen 45ca8f8113 Merge branch 'master' into 'extended_pu-depths' 2020-03-25 15:11:08 +02:00
siivonek 5986e71535 Fix mistake 2020-03-20 13:43:44 +02:00
Jaakko Laitinen d6ffe9e495 Update docs 2020-03-20 13:27:07 +02:00
Jaakko Laitinen 621450cc1d Update --help 2020-03-20 13:07:48 +02:00
Jaakko Laitinen aaac3df69b Add prefix to kvazaar.h define 2020-03-20 09:04:00 +02:00
siivonek 2a85be5752 Move qp_to_lambda so it is defined before use. Change some tabs to spaces 2020-03-19 22:13:53 +02:00
siivonek 0a4ce3c0aa Add vaq to new rate control 2020-03-19 21:43:52 +02:00
siivonek 1bbc598d75 Merge branch 'master' into vaq 2020-03-19 20:19:43 +02:00
Joose Sainio b53911d637 Merge branch 'rc-intra' 2020-03-19 13:34:15 +02:00
Joose Sainio a304a8ea6e Add weights for GOP 16 based on fitting a power curve to bits spent by HM 2020-03-19 11:13:43 +02:00
Joose Sainio e823ac1dae miscellaneous fixes
- bump library version
- add help desk for --clip-neighbour
- update the default values of --clip-neighbour and --intra-bits
- update tests to more sensible
2020-03-19 10:47:28 +02:00
Jaakko Laitinen b2ddba38c2 Set correct size for pu-depth min/max data structure 2020-03-19 09:29:43 +02:00
Joose Sainio 2c345bc3cf try to fix tsan issue 2020-03-18 14:58:54 +02:00
Jaakko Laitinen fe428dcbe1 Fix no gop functionality 2020-03-18 11:03:33 +02:00
Jaakko Laitinen af3d559d8d Let pu-depth be defined per gop-layer 2020-03-17 17:57:18 +02:00
Ari Lemmetti cbd77944d8 Costs in rough intra search may be negative. Get rid of UBSan error. 2020-03-16 22:13:14 +02:00
Ari Lemmetti aa0ade3f65 Cast values to unsigned to make UBSan not trigger due to left-shifting negatives 2020-03-16 19:52:34 +02:00
RLamm 27fe716654 Fixed reference POC indexing 2020-03-11 15:33:37 +02:00
RLamm bf24831780 Attempt to fix random crashes 2020-03-11 15:31:47 +02:00
RLamm 887659db1f Attempted to scale the extra_mvs 2020-03-11 15:31:46 +02:00
siivonek 8d9719ff90 Merge branch 'master' into vaq 2020-03-05 14:17:01 +02:00
Joose Sainio c9a8f2a596 Completely disable intra based model for frame 1 2020-03-04 12:52:13 +02:00
Joose Sainio 19c79c3e58 don't use the intra frame based estimation if the result is bad 2020-03-04 09:26:22 +02:00
Ari Lemmetti 7b7358c25a Update presets veryslow and placebo a bit
Both use now --gop 16, --intra-qp-offset -3, --me tz, and --transform-skip
2020-03-03 20:41:01 +02:00
Pauli Oikkonen 60e7956dc5 Disable inaccurate integer variance calculation for now 2020-03-02 19:18:55 +02:00
Pauli Oikkonen fc1b91335b Implement variance calculation in integer math
Maybe this is a bit faster than FP, it's not accurate though
2020-03-02 18:17:18 +02:00
Pauli Oikkonen 35c825c75f Move hsum_8x32b to avx2_common_functions 2020-02-27 17:52:17 +02:00
Pauli Oikkonen b00ac7d1c4 AVX2 version of buffer variance calculation 2020-02-25 15:57:56 +02:00
siivonek a380e43bda Add chroma channels to variance calculation. 2020-02-24 19:54:34 +02:00
Pauli Oikkonen 1bd9c6dd93 Make a strategy out of pixel_var 2020-02-24 19:37:36 +02:00
Pauli Oikkonen 86ebf366e1 fix typo 2020-02-24 18:18:10 +02:00
Joose Sainio f81de41775 Merge branch 'master' into rc-intra 2020-02-24 15:30:57 +02:00
siivonek 5688bcd646 Merge branch 'master' into vaq 2020-02-21 17:11:10 +02:00
siivonek 908ecb1767 Add rounding to aq offsets. Fix typo 2020-02-21 13:51:43 +02:00
Ari Lemmetti 1dfc69b42e Consider merge index bits in merge analysis and early skip 2020-02-20 09:43:58 +02:00
Joose Sainio 7deb22c8e8 Merge branch 'master' into rc-intra 2020-02-19 15:01:04 +02:00
Kari Siivonen (TAU) c972ca9067 Add assert to check if deltaQP out of bounds. Clip adaptive QP to [-13, 12]. 2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU) f07990794f Fix error in vaq pixel blit range calculation 2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU) 57ed40c263 Fix application of aq offset 2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU) be2f420d61 Change: vaq requires parameter. Parameter defines vaq strength ex. 15 == 1.5 2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU) bf1b2c1e22 Add define for vaq strength parameter 2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU) 150559a7e8 Fix bugs. Enable set_qp_in_cu when using vaq 2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU) c8c71274ee Change tabs to spaces. 2020-02-18 13:20:26 +02:00
siivonek 888382953d Implement calculation of vaq values. Values not used yet. 2020-02-18 13:20:25 +02:00
siivonek ad40a88c09 Add no-vaq option to vaq 2020-02-18 13:20:25 +02:00
siivonek 09f0a1c52e Fix typo in comment 2020-02-18 13:20:25 +02:00
siivonek 84fb3fd7d1 aq: Add --vaq commandline option 2020-02-18 13:20:25 +02:00
Joose Sainio 2a98f5db1e fix intra-bits for lp-gop 2020-02-18 10:38:29 +02:00
Ari Lemmetti 71d9327f62 Further improve fast bipred 2020-02-17 20:32:52 +02:00
Ari Lemmetti 80c26870d5 Update docs 2020-02-15 23:29:18 +02:00
Ari Lemmetti ebb183cc01 Add option to make intra QP offset configurable 2020-02-15 22:54:48 +02:00
Ari Lemmetti be3e08d6db Add gop.h to Makefile 2020-02-15 22:54:47 +02:00
Ari Lemmetti 1354acd358 Prevent negative values being written to SPS with --gop=0 2020-02-15 22:54:47 +02:00
Ari Lemmetti fe4869916c Disable GOP and intra qp offset for all-intra coding automatically 2020-02-15 22:54:46 +02:00
Ari Lemmetti 9849fb7c77 Enable experimental rate control for GOP 16 2020-02-15 22:54:46 +02:00
Ari Lemmetti a0a22dec8a Remove deprecated / unused lambda adjustments 2020-02-15 22:54:46 +02:00
Arttu Ylä-Outinen 829a70e6a7 Copy lowdelay GOP definition from HM 2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen 28f99c0b87 Change definition of 8-GOP to match HM 2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen 636fa8fbdd Fix maximum decoded picture buffer size 2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen ebd5156db5 Add definition for random access GOP of length 16 2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen 6653f06dd0 Only compute GOP layer weights when RC is enabled 2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen c8fff1e0d6 Use a larger number of bits for POC lsb when needed
Changes the number of bits used for coding the least significant bits of
the POC based on the GOP size.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen d757a832c2 Change GOP QP offset handling to match HM
Adds fields qp_model_scale and qp_model_offset to kvz_gop_config and
intra_qp_offset to kvz_config.
2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen f37dcd5879 Move GOP definition to a separate file
Moves definition of the 8-GOP from cfg.c to gop.h.
2020-02-15 22:36:55 +02:00
Ari Lemmetti 6e1007a3e7 Get rid of LAMBA! (Commit #3000) 2020-02-15 22:32:52 +02:00
Ari Lemmetti 0c02e71b43 Remove minor error from readme 2020-02-15 22:29:08 +02:00
Joose Sainio e90d3141a2 Merge branch 'master' into rc-intra 2020-02-05 11:06:56 +02:00
Ari Lemmetti 9a0236bb4e Add option 'zero-coeff-rdo' 2020-02-04 21:26:29 +02:00
Ari Lemmetti 886ff36d12 Initial implementation of fast bipred. 2020-02-04 15:46:23 +02:00
Ari Lemmetti 3c7dd0752f Remove the broken "no mov" branch.
Causes hash mismatches for example in SlideShow sequence.
2020-02-03 15:26:31 +02:00
RLamm bf8941ddb8 Added comment about partial-coding usage 2020-01-31 16:19:48 +02:00
RLamm b8488ab48d Changed "partial-coding" variables to uint32_t 2020-01-31 16:02:29 +02:00
RLamm 76e3249754 Changed parameter "slicer" to "partial-coding" to avoid confusion. 2020-01-31 14:22:32 +02:00
RLamm 30d5df40c5 Custom headers for the distributed coding 2020-01-29 15:54:49 +02:00
Joose Sainio 54571529a4 Fix accessing previous frame that didn't exist 2020-01-17 10:48:35 +02:00
Joose Sainio 5c671d20e1 Use the new clipping only in situations where it actually helps 2020-01-17 09:08:21 +02:00
Joose Sainio 3c34d7c863 Fix qp estimation and checking of previous frames that dont exist 2020-01-15 09:32:04 +02:00
Joose Sainio 1a35c22a52 Change clipping of lambda and qp for ctus on OBA rc
instead of clipping qp and lambda to the value of last value from the state
clip to previous frame with same layer and if such frame doesn't exist, clip
to previous frame
2020-01-14 14:46:05 +02:00
Pauli Oikkonen c3d9e97e9f Fix VS build 2019-12-12 18:34:55 +02:00
Pauli Oikkonen 7f238ca299 Remove debug print functions
Whoops
2019-12-12 18:19:31 +02:00
Pauli Oikkonen eefb5e50b3 De-inline pred_filtered_dc functions, shouldn't make much difference though 2019-12-12 17:30:00 +02:00
Pauli Oikkonen 169314de4f 32x32 filtered DC prediction in AVX2 2019-12-11 18:17:06 +02:00
Pauli Oikkonen fb2481b7e4 16x16 filtered DC implemented in AVX2 2019-12-10 15:54:50 +02:00
Joose Sainio b78aa7b272 save c and k to frame 2019-12-06 10:52:54 +02:00
Joose Sainio 5b10e5fb7e parameterize the clipping option 2019-12-06 09:51:04 +02:00
Pauli Oikkonen da370ea36d Implement AVX2 8x8 filtered DC algorithm 2019-11-28 14:10:10 +02:00
Pauli Oikkonen 5d9b7019ca Implement a 4x4 filtered DC pred function 2019-11-26 17:05:54 +02:00
Joose Sainio ca0060cbba try the original clipping 2019-11-26 15:13:04 +02:00
Pauli Oikkonen f1485ab087 Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes? 2019-11-25 15:20:29 +02:00
Joose Sainio ab2fded8af Update threadwrapper to enable pthread_rwlock_t 2019-11-21 13:38:40 +02:00
Joose Sainio eb78aead1f Fix additional potential data races 2019-11-21 11:03:12 +02:00
Joose Sainio 35d7e0d88b Fix data race 2019-11-21 10:25:04 +02:00
Marko Viitanen 94d89f03c7 Added cfg variable intra_smoothing_disabled and some cleanup 2019-11-20 08:38:33 +02:00
Marko Viitanen eb2caf9118 Fix intra angle filter, changed from gauss filter table to run-time calculated 4-tap filter 2019-11-19 15:15:21 +02:00
Pauli Oikkonen 979d66031c Create a strategy out of intra_pred_filtered_dc 2019-11-19 14:50:31 +02:00
Marko Viitanen 466d8772b0 Apply JVET_P0170_ZERO_POS_SIMPLIFICATION in coeff bypass coding 2019-11-19 14:32:38 +02:00
Joose Sainio 0e8815a3d8 test clipping qp to previous frame instead of previous ctus 2019-11-19 14:32:31 +02:00
Joose Sainio ddb4e5a131 move the intra bit calculation so that it is used also with lambda rc 2019-11-19 14:16:48 +02:00
Joose Sainio a07833f3e6 check that mallocs in rc initialization were successful
only call kvz_update_after_picture when using the OBA rc
2019-11-19 13:59:44 +02:00
Joose Sainio 50d410a316 re-enable static qp encoding and lambda rc 2019-11-19 13:45:58 +02:00
Pauli Oikkonen fa4bb86406 Optimize intra_pred_planar_avx2 for 4x4 blocks 2019-11-19 13:39:02 +02:00
Marko Viitanen 3df2642b03 Fix qt cbf context init value 2019-11-19 13:27:36 +02:00
Joose Sainio 57e5615ece Fix incorrect intra rc calculation skipping 2019-11-19 13:25:31 +02:00
Joose Sainio 6cc3bcd87e Command line parameters for oba rc and implementation of the usage of the intra parameter 2019-11-19 09:29:06 +02:00
Joose Sainio eb73548af5 Encode first frame completely before starting others to enable owf 2019-11-18 09:51:37 +02:00
Marko Viitanen 17a53230fd Code cleanup, remove unused arrays and remove tabs 2019-11-18 09:01:23 +02:00
Pauli Oikkonen 4761d228f9 Start to vectorize the 4x4 loop 2019-11-15 17:32:40 +02:00
Pauli Oikkonen 8d45ab4951 Stupidify the 4x4 planar loop for vectorization 2019-11-14 17:14:04 +02:00
Marko Viitanen 91528f3292 Update contexts 2019-11-14 13:46:51 +02:00
Marko Viitanen b309ed90be Fix NAL packet and missing fields in SPS 2019-11-14 09:21:11 +02:00
Marko Viitanen 74514981a9 Fixed PPS, SPS and slice headers and NAL unit types 2019-11-13 15:59:36 +02:00
Joose Sainio c759c138ed Prepare the rc data structure to be shared among all frame encoders 2019-11-13 11:56:25 +02:00
Joose Sainio cdb7c851a4 Fix weight calculation 2019-11-13 08:55:31 +02:00
Joose Sainio b9b01f8036 WPP with threading 2019-11-12 12:12:57 +02:00
Joose Sainio 615973adca should enable threading with wpp when owf is not used 2019-11-12 09:03:00 +02:00
Pauli Oikkonen 6f13f6525c Merge branch 'new_prints' 2019-11-07 17:04:21 +02:00
Joose Sainio d353f7dd1a Disable debug prints, fix multiple bugs in the calculation 2019-11-07 15:08:57 +02:00
mercat 57e8c3ebc2 Merge branch 'ML-cplx_red_ICIP' 2019-11-07 13:25:47 +02:00
Pauli Oikkonen 558f0ec401 Mbps, not mbps 2019-11-05 18:06:00 +02:00
Pauli Oikkonen 2edf533925 Tidy the end report printing
Also fix a bug with non-integer target FPS
2019-11-05 17:20:00 +02:00
Joose Sainio 408fd4ccb6 Fix lambda and qp calcualtion for intra frames
also fixes a bug with selecting the clip neighbor lambda and clip neighbor qp
selection for inter frames
2019-11-05 10:51:39 +02:00
Pauli Oikkonen c7313ce567 Store AVG QP information in encmain 2019-11-04 17:08:07 +02:00
Reima Hyvönen 80575c59bf Some updates done to get right bitrate and avg QP 2019-10-31 15:56:24 +02:00
Reima Hyvönen 252bab8820 Added prints to bitrate and AVG QP 2019-10-31 15:56:24 +02:00
Pauli Oikkonen 6d7a4f555c Also remove 16x16 (A * B^T)^T matrix multiply
Can be done using (B * A^T) instead, it's the exact same
2019-10-28 16:19:42 +02:00
Pauli Oikkonen 2c2deb2366 Tidy AVX2 32x32 matrix multiply 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 98ad78b333 Tidy the old AVX2 32x32 matrix multiply
It was actually a very good algorithm, just looked messy!
2019-10-28 16:19:42 +02:00
Pauli Oikkonen 4a921cbdb5 Retain data as much in YMM registers as possible
This seems to make it a whole lot quicker
2019-10-28 16:19:42 +02:00
Pauli Oikkonen ac4d710e23 Unroll 32x32 matrix multiply, use all regs 2019-10-28 16:19:42 +02:00
Pauli Oikkonen a58608d0b8 Remove totally unnecessary (A * B^T)^T 32x32 multiply 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 043f53539f Implement a streamlined matrix-multiply 32x32 DCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen e9da2d851b Tidy 32x32 fast DCT's helper functions 2019-10-28 16:19:42 +02:00
Pauli Oikkonen e382339182 Implement fast (butterfly) 32x32 DCT in AVX2 2019-10-28 16:19:42 +02:00
Pauli Oikkonen b5962dadac Tidy indentation in AVX2 16x16 iDCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 36a8f89025 Fine-tune 16x16 AVX2 iDCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen ca9409de2b Implement 16x16 DCT as butterfly algorithm in AVX2 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 7c69a26717 Use aligned loads and stores for AVX2 DCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 8e9c65dca6 Align DCT matrices and temp transform buffers 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 148a150522 Align DCT source and dest blocks to cache line 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 8e60bbf6a6 Slightly tune 16x16 forward DCT
Use an array of __m256i's to store temporary value, essentially letting
the compiler enforce alignment and use aligned loads and stores.
2019-10-28 16:19:42 +02:00
Pauli Oikkonen c0cc0e8a75 Optimize 16x16 multiply by only slicing right mat once 2019-10-28 16:19:42 +02:00
Pauli Oikkonen e463d27f22 Implement streamlined generic 16x16 matrix multiply
It can't be this fast for real, can it?
2019-10-28 16:19:42 +02:00
Pauli Oikkonen beb85ce9d6 Reorder parameters for 8x8 matrix multiplies 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 292af62256 Implement tailored 16x16 forward DCT 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 30ce461d98 Redo 4x4 matrix multiplication 2019-10-28 16:19:42 +02:00
Pauli Oikkonen 07970ea82f Streamline by-the-book 8x8 matrix multiplication
Also chop up the forward transform into two tailored multiply functions
2019-10-28 16:19:42 +02:00
Pauli Oikkonen 7ec7ab3361 Implement a tailored AVX2 8x8 DCT 2019-10-28 16:19:42 +02:00
Joose Sainio 372934c7db Fix division by zero 2019-10-10 16:35:56 +03:00
Joose Sainio 9bdfdeaf5c Rest of the owl 2019-10-09 15:48:58 +03:00
Joose Sainio 1ba8525faf WIP 2019-10-09 10:35:07 +03:00
Joose Sainio 19496d2692 ? 2019-10-03 14:50:11 +03:00
Joose Sainio 4b111e339e fix couple of bugs in the implementation, bit calculation seems still bit off 2019-10-01 15:08:39 +03:00
Joose Sainio 84615e406a fix compiler warnings 2019-09-27 14:20:08 +03:00
Joose Sainio 14b7a75713 Call the new functions and fix bugs 2019-09-27 14:14:24 +03:00
Joose Sainio ef74bfb182 unify naming 2019-09-27 10:16:21 +03:00
Joose Sainio e36f481bda qp calculation for frame 2019-09-27 09:05:40 +03:00
Joose Sainio 47019ca1cd intra ck update 2019-09-26 16:04:53 +03:00
Joose Sainio 7c8f4da7cb Update c and k except after first intra 2019-09-26 13:09:28 +03:00
Joose Sainio 0577d481c1 CTU level code 2019-09-25 12:12:21 +03:00
pkubaj 1d7fcf4227
Fix build on powerpc64 with LLVM 2019-09-12 15:05:00 +02:00
mercat 0de567bfa4 Fixe memory leak 2019-09-12 09:45:32 +03:00
mercat fa116de619 Add static 2019-09-11 16:18:12 +03:00
mercat b8753a9293 Fucking INLINE fixed 2019-09-11 16:12:07 +03:00
mercat b855144e68 INLINE fixe 2019-09-11 16:12:07 +03:00
mercat 694337b803 Add const and more const 2019-09-11 16:12:07 +03:00
mercat 21c07638ed Remove const into kvz_init_constraint. 2019-09-11 16:12:06 +03:00
mercat 2bca507abe Clean version of machine learning constraint code. (ICIP paper) 2019-09-11 16:12:06 +03:00
Alexandre Mercat 0f4b7be6ee First version of ML ICIP code for master 2019-09-11 16:12:06 +03:00
Pauli Oikkonen 99597b828a Work around the ancient Win32 calling convention hassle
See if this'll work now
2019-09-06 13:14:42 +03:00
Pauli Oikkonen c5ca18950c Revert "Revert to 6924d90052 due to broken visual studio build"
This reverts commit 1dd0619bd7.
2019-09-05 18:21:55 +03:00
Pauli Oikkonen 55529decd5 Implement _mm256_insert_epi32 and extract pseudo-ops
Visual Studio headers apparently lack these guys
2019-09-05 18:20:52 +03:00
Marko Viitanen 28dc4fa2ed Fix intra MPM selection 2019-09-05 09:39:13 +03:00
Ari Lemmetti 147378e1f9 Prevent 8x4 and 4x8 bipred in merge analysis 2019-09-03 16:32:50 +03:00
Ari Lemmetti ef1fdbf259 Separate prediction of single PU/PB from CU/CB 2019-09-03 16:32:50 +03:00
Joose Sainio 7d2737bdf6 WIP picture lambda calculation 2019-09-03 11:03:35 +03:00
Ari Lemmetti 3bc510712f Enable merge analysis for smp and amp 2019-09-02 17:31:51 +03:00
Ari Lemmetti 557bcbc6aa Make luma or chroma only inter "recon" or predict possible 2019-09-02 17:15:28 +03:00
Marko Viitanen 6d5e20ca13 Header changes to match VTM 6.1 2019-09-02 09:42:35 +03:00
RLamm 60be6d411c Intra filtering fixed at least for luma. All intra modes output valid luma (hashes match), but chroma is still broken. 2019-08-30 16:14:00 +03:00
RLamm 83ac39094a Use new PDPC filtering for planar and DC modes 2019-08-29 12:51:34 +03:00
Joose Sainio 131c04f65c Fix incorrect weight for intra frame 2019-08-29 12:01:13 +03:00
Joose Sainio 8f96678d13 Fix issue with intra frames being part of gop when they shouldn't 2019-08-29 09:28:10 +03:00
Ari Lemmetti aa8ab195d1 Compare rough cost of the best merge mode against AMVP to make mode decision 2019-08-26 22:49:09 +03:00
Ari Lemmetti 8f866ff83a Use correct index 2019-08-26 20:10:10 +03:00
Ari Lemmetti 2343958a14 Fix transform split for small luma blocks 2019-08-24 21:50:17 +03:00
Ari Lemmetti 800fc8644d Reset CBFs because CBFs might have been set earlier for depth earlier. 2019-08-24 21:49:33 +03:00
Ari Lemmetti a80de22bc7 Add only different candidates to the list 2019-08-24 21:49:33 +03:00
Ari Lemmetti 45c7961412 Remove tr depth fill. It should not be needed. 2019-08-24 21:49:32 +03:00
Ari Lemmetti ff8711aaab Add missing logic to add valid indices to list 2019-08-24 21:49:29 +03:00
Marko Viitanen cb0d7c340a Use the new PDPC filtering in angular intra 2019-08-23 14:44:41 +03:00
Marko Viitanen 5bebb18943 Change intra filtering according to VTM6 2019-08-23 08:56:35 +03:00
Marko Viitanen a16efe6b52 Merge remote-tracking branch 'remotes/github_kvazaar/master'
# Conflicts:
#	build/kvazaar_VS2013.sln
#	build/kvazaar_VS2015.sln
#	build/kvazaar_VS2017.sln
#	build/kvazaar_cli/kvazaar_cli.vcxproj
#	build/kvazaar_lib/kvazaar_lib.vcxproj
#	build/kvazaar_tests/kvazaar_tests.vcxproj
#	src/encode_coding_tree.c
#	src/encode_coding_tree.h
#	src/encoder_state-bitstream.c
#	src/inter.c
#	src/strategies/avx2/quant-avx2.c
2019-08-22 15:12:01 +03:00
Marko Viitanen 01ea762c1f Fix coeff coding ad remove bdpcm flag -> CABAC bits match with VTM 6.0 2019-08-22 14:33:42 +03:00
Marko Viitanen 210af8adbe Remove joint_cb_cr flag and fix split_flag context selection 2019-08-22 11:23:24 +03:00
Marko Viitanen c713d31c93 Fix sig_coeff context selection 2019-08-22 10:57:50 +03:00
Marko Viitanen 48b8898e53 Fix CBF context init and use 2019-08-22 10:44:47 +03:00
Marko Viitanen db94ec1a84 Rename intra_mode_model -> intra_luma_mpm_flag_model and update the contexts 2019-08-19 15:17:25 +03:00
Marko Viitanen 1c6ffc0a7e Fix wrong variable types in context init 2019-08-19 14:33:55 +03:00
Marko Viitanen cd6be15e10 Fix context init to match VTM6.0 2019-08-19 13:57:31 +03:00
Marko Viitanen 3de198d2db Sync contexts with VTM6.0 2019-08-19 09:39:59 +03:00
Marko Viitanen e644b03615 Fix headers to match VTM6.0rc1 2019-08-16 15:33:20 +03:00
Ari Lemmetti 1dd0619bd7 Revert to 6924d90052 due to broken visual studio build 2019-08-08 15:15:34 +03:00
Pauli Oikkonen 2852baa673 Separate sign3_diff_epu8 from calc_eo_cat
Just to keep things simple, clear and obvious
2019-08-07 16:35:24 +03:00
Pauli Oikkonen 17947b79ee Add sao_shared_generics.h in Makefile.am 2019-08-07 16:35:24 +03:00
Pauli Oikkonen a8dd6ce351 Add a note about having implemented a separate AVX2 version of SAO offset array calculation 2019-08-07 16:35:24 +03:00
Pauli Oikkonen a858e7dd4b Combine duplicate code into inline functions 2019-08-07 16:35:24 +03:00
Pauli Oikkonen de0e97f711 Take 8/16/24b loads and stores into separate functions 2019-08-07 16:35:24 +03:00
Pauli Oikkonen 10979f58fe Tidy up code 2019-08-07 16:35:24 +03:00
Pauli Oikkonen 9cc11976c0 Combine the delta accumulation from edge and band ddistortion into shared func
This won't reduce object size, but there'll be less duplicate code
2019-08-07 16:35:24 +03:00
Pauli Oikkonen 55d877bd66 Vectorize sao_edge_ddistortion 2019-08-07 16:35:24 +03:00
Pauli Oikkonen aef0f301d3 Fix function signatures
Mark anything intended as read-only to be const, and fix alignment
2019-08-07 16:35:24 +03:00
Pauli Oikkonen 997fd369b3 Redo calc_sao_edge_dir_avx2
Do it wider, 32 pixels at once!
2019-08-07 16:35:24 +03:00
Pauli Oikkonen db1e475e02 Use i32 instead of i8 for x/y offsets
Doesn't matter too much, because this number isn't used in SIMD
computation, only as a memory reference offset.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen 12de466ef5 Reimplement non-band SAO color reconstruction in AVX2
Streamline things to work on 32 pixels at once instead of 8
2019-08-07 16:35:24 +03:00
Pauli Oikkonen e8bff99329 Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction
Vectorize it all, hope this helps with perf
2019-08-07 16:35:24 +03:00
Pauli Oikkonen 7b5dffa855 Implement calc_sao_offset_array in AVX2
To be efficient, the AVX2 color reconstruction algorithm will need
offsets in byte, not dword, arrays. This is completely specific to 8-bit
pixels and the function signature is fundamentally distinct from the
generic algorithm, so it's better to not strategize SAO offset array
calculation.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen 29563b7039 Make kvz_calc_sao_offset_array more obvious
Name temporary values from array lookups etc that are referred multiple
times to, to make the behavior of the mechanism more transparent. Define
all the constant values at the beginning of the function and declare as
const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen 08881f5e9b (TEMP) (TODO) (whatever) Avoid compiler warnings
I want the CI to not crash on its -Wall -Werror, but instead to actually
build the thing and report me about actual memory errors etc
2019-08-07 16:35:24 +03:00
Pauli Oikkonen c18adc5ee0 Redo sao_band_ddistortion_avx2
Avoid branching and do the entire thing on 32 pixels at once in YMMs.
Also make the sao_bands function parameter const.
2019-08-07 16:35:24 +03:00
Pauli Oikkonen 2827c3e3ab Make calc_sao_bands less opaque 2019-08-07 16:35:24 +03:00
Pauli Oikkonen 1bb9a079a8 Fix indentation 2019-08-07 16:35:24 +03:00
Reima Hyvönen 7bc959c7c5 3 sao functions are now working 2019-08-07 16:35:24 +03:00
Reima Hyvönen 0e0f2d3490 made to clear sum vector after it has been set to memory 2019-08-07 16:35:24 +03:00
Reima Hyvönen f146de7acb removed some variables to prevent memory losses 2019-08-07 16:35:24 +03:00
Reima Hyvönen 247c3a7a71 conversed gined to unsigned int 2019-08-07 16:35:24 +03:00
Reima Hyvönen ac5c216974 Some more memory error preventing to sao_edge_ddistortion_avx2 2019-08-07 16:35:24 +03:00
Reima Hyvönen 3fb1cbca35 more editing sao_edge_ddistortion_avx2 2019-08-07 16:35:24 +03:00
Reima Hyvönen afbb6fb960 some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures 2019-08-07 16:35:24 +03:00
Reima Hyvönen 3496a57f7a Edited sao_edge_ddistortion_avx2 to avoid memory overflow 2019-08-07 16:35:24 +03:00
Reima Hyvönen 267ba1d6ce Modified sao_band_ddistortion_avx2 2019-08-07 16:35:24 +03:00
Reima Hyvönen e70663b245 added some sub commands to avoid memory read errors 2019-08-07 16:35:24 +03:00
Reima Hyvönen 59dfb4570c Converted some loads to load int8_t instead ints 2019-08-07 16:35:24 +03:00
Reima Hyvönen 8b253209a8 Found false address load from calc_sao_edge_dir. Should now work like generic 2019-08-07 16:35:24 +03:00
Reima Hyvönen 50e0a47b7a Took away __restrict 2019-08-07 16:35:24 +03:00
Reima Hyvönen 8a39eb674e Removed c-variable from calc_sao_edge_dir_avx2 2019-08-07 16:35:24 +03:00
Reima Hyvönen bc0a36830d Clerified some 6 pixel loads 2019-08-07 16:35:24 +03:00
Reima Hyvönen 1a8b211e05 Added break to line 170 2019-08-07 16:35:24 +03:00
Reima Hyvönen d05e750ebe Added some switches to prevent segmentation fault from reading 2019-08-07 16:35:24 +03:00
Reima Hyvönen 203580047d Defined some AVX functions 2019-08-07 16:35:24 +03:00
Reima Hyvönen c884c738b1 Updated some commands to match the standard 2019-08-07 16:35:24 +03:00
Reima Hyvönen b412ed2f59 Removed some setr and used loads calc_sao_edge_dir_avx2 2019-08-07 16:35:24 +03:00
Reima Hyvönen c6cc063534 converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract 2019-08-07 16:35:24 +03:00
Reima Hyvönen 47ac109b10 optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND 2019-08-07 16:35:24 +03:00
Reima Hyvönen 96dc60a1ed first working optimation 2019-08-07 16:35:24 +03:00
Reima Hyvönen c148aff9fb Some optimation done to function sao_reconstruct_color_avx2 2019-08-07 16:35:24 +03:00
Reima Hyvönen bf16ba6cc4 Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2 2019-08-07 16:35:24 +03:00
Reima Hyvönen 79dc39a676 Some editing for sao_edge_ddistortion_avx2 2019-08-07 16:35:24 +03:00
Reima Hyvönen 06ee52924e some reconst done to calc_sao_edge_dir_avx2 2019-08-07 16:35:24 +03:00
Reima Hyvönen 5fbc65d823 reconst optimation doesn't work yet 2019-08-07 16:35:24 +03:00
Reima Hyvönen d29f834a69 Remove useless function 2019-08-07 16:35:24 +03:00
Reima Hyvönen a232a12160 calc_sao_edge_dir_avx2 updated 2019-08-07 16:35:24 +03:00
Reima Hyvönen b1febc02a5 sao_edge_ddistortion_avx2 now working proberly 2019-08-07 16:35:24 +03:00
Reima Hyvönen cd6092a1ec Still too much bits, looking for where they appear 2019-08-07 16:35:24 +03:00
Reima Hyvönen 7853be8eeb Incomple optimation 2019-08-07 16:35:24 +03:00
Marko Viitanen dfa5621024 Intrapred cleanup 2019-07-16 14:23:10 +03:00
Ari Lemmetti 40609aa865 Add missing headers to Makefile.am 2019-07-12 19:15:51 +03:00
Ari Lemmetti 5db3a78499 Bump versions for release 1.3 2019-07-09 22:09:32 +03:00
Ari Lemmetti d513ab1999 Add missing newline 2019-07-09 21:06:05 +03:00
Ari Lemmetti 4967072625 Do not bypass search on skip cu if early_skip is not enabled 2019-07-09 20:20:12 +03:00
Ari Lemmetti b20992a9f3 Rename functions more descriptive 2019-07-09 20:20:11 +03:00
Ari Lemmetti a348a0ec23 Fix transform depth in early skip 2019-07-09 20:05:48 +03:00
Pauli Oikkonen 8d48bee180 Tidy fast coeff cost code 2019-07-09 18:01:54 +03:00
Pauli Oikkonen 201a43b08e Clean up the RD-estimation code 2019-07-09 18:01:54 +03:00
Pauli Oikkonen b111df5073 Create preliminary version of improved cost estimator 2019-07-09 18:01:54 +03:00
Ari Lemmetti be08a87d94 Add missing parameter max-merge to the help message 2019-07-09 16:28:46 +03:00
Ari Lemmetti d0bb9b4a6d Add parameter max-merge to presets 2019-07-09 16:26:03 +03:00
Ari Lemmetti 4097331fd6 Early skip 2019-07-09 15:59:31 +03:00
Marko Viitanen 10d850e98a Use index_offset in intra angular and change the offset to width+1 2019-07-08 14:23:19 +03:00
Marko Viitanen 3d1fa2a9cf Fixing angular intra prediction reference pixels 2019-07-08 14:00:02 +03:00
Marko Viitanen 0656c54cab Fix some problems with reference pixels in angular intra prediction kvz_angular_pred_generic() 2019-07-05 15:54:51 +03:00
Marko Viitanen 89ca2d4ba1 Use correct type for modedisp2sampledisp array 2019-07-05 14:12:10 +03:00
Marko Viitanen 2e8a0d08f9 Fix mvp_idx_model initialization and use 2019-07-05 14:11:29 +03:00
Joose Sainio 977e885ea2 Fix issue with gop=0 introduced in 1c36f68d0c 2019-07-05 12:57:27 +03:00
Marko Viitanen c6217e236f Enable 4-tap filtering for the intra angular 2019-07-04 16:26:10 +03:00
Marko Viitanen cda6d951c0 Change DCT arrays back to 8-bit -> some frames are now correct 2019-07-04 15:59:10 +03:00
Marko Viitanen 8280bd3217 Add channel info to angular_pred and fix the displacement tables.
Also includes 4-tap intra filtering code commented out
2019-07-04 09:35:47 +03:00
Marko Viitanen 5e4369d6b0 Fix the kvz_cabac_encode_aligned_bins_ep function -> cabac coding now correct 2019-07-03 15:55:52 +03:00
Marko Viitanen 3fad4b0a98 Disable kvz_cabac_encode_aligned_bins_ep for now and add a ToDo message 2019-07-03 15:44:35 +03:00
Sami Ahovainio ce1e67cc3a Modified header flags to match VTM commit b9080ff45bec368c44f0c43a32dcd6804ef9f5d6 2019-07-01 13:58:15 +03:00
Sami Ahovainio 3863064d90 Fixed bugs in split decision and coefficient coding. 2019-07-01 13:00:43 +03:00
Mikko Pitkänen a7f09c8114 Merge branch 'threadwrapper' 2019-06-24 16:54:59 +03:00
Sami Ahovainio db5c0230e5 Fixed coefficient sign hiding 2019-06-20 16:26:01 +03:00
Sami Ahovainio b51254cafd Fixed significant coefficient group context calculation 2019-06-20 15:47:13 +03:00
Sami Ahovainio 5e0bea962c Fixed split context decision 2019-06-20 15:30:49 +03:00
Sami Ahovainio 12322144f0 Removed debug print from context.c 2019-06-20 15:18:22 +03:00
Sami Ahovainio 3a9800d07d Fixed coefficient coding. Fixed headers to match VTM commit e65075531471a68632bc9252d607655a0feeabc6 2019-06-20 14:43:03 +03:00
Mikko Pitkänen 3dd606ce2e Add new threadwrapper 2019-06-18 18:45:45 +03:00
Sami Ahovainio 2c78aa0642 Fixes to coeff coding. 2019-06-13 12:01:29 +03:00
Joose Sainio c94077d15e remove hardcoded value 2019-06-12 14:37:41 +03:00
Joose Sainio ac68c8444d remove negation that wasn't supposed to be there 2019-06-12 14:35:24 +03:00
Joose Sainio 5851dcc3be missing negation 2019-06-12 14:08:18 +03:00
Joose Sainio 1c36f68d0c Fix owf>=9 gop=8 and add test to catch such problem in future 2019-06-12 14:04:41 +03:00
Sami Ahovainio 3564b4829e Fixed split context decision. Modified intra mode initialization to match VTM version aa76fc5c04cf43390f43d63f9977bea8ee31997a. 2019-06-12 12:59:16 +03:00
Sami Ahovainio a8a53e15b5 Fixed headers to match VTM commit aa76fc5c04cf43390f43d63f9977bea8ee31997a. Added multi_ref_line flag coding. 2019-06-07 13:37:45 +03:00
Ari Lemmetti 933ff6ed55 Merge branch 'set-qp-in-cu-fix' 2019-06-07 09:01:03 +03:00
Sami Ahovainio 8d2581e58c Fixed issue with kvz_go_rice_par_abs where passing a unsigned argument caused MIN function to return wrong value. Modified coefficient coding to match VTM 5.0. Some issues still remain. 2019-06-05 15:57:18 +03:00
Sami Ahovainio 367f1b2129 Fixed splitting bug caused by wrong values in the headers. Fixed header flags to match VTM commit 5703e81b2de677d976ec15423f5768b17619ba6a 2019-06-05 11:21:02 +03:00
Sami Ahovainio 76d56290ed Fixed VUI header writing. Fixed debug prints of NAL headers and rbsp_stop_one_bit. 2019-05-31 11:13:11 +03:00
Ari Lemmetti c6da839002 Set lcu sqrt lambda according to lcu lambda instead of frame lambda when ROI is used 2019-05-29 18:32:10 +03:00
Marko Viitanen 8282a18c36 Fixed headers and NAL writing to match the latest VTM master 988c22cbb9c58584cac3ef0ec7794cafbea6dfd6 2019-05-29 16:18:35 +03:00
Sami Ahovainio 4768ba0628 Minor fixes to header writing. Added contexts for multi_ref_line and BDPCM. Functions added for writing both in bitstream, but they are both disabled for now. 2019-05-29 13:00:19 +03:00
Sami Ahovainio 3339e12169 Fixed some header flags 2019-05-27 09:56:56 +03:00
Ari Lemmetti 9339845e8b Set QP completely at CU level as the name '--set-qp-in-cu' implies
-Move slice delta QP to CU level when using --set-qp-in-cu
-Separate functionality from roi
2019-05-24 20:38:39 +03:00
Pauli Oikkonen 081d16fc33 Fix intrinsics that may be missing on some systems
Create a header to collect all the workarounds for missing intrinsics
in one place
2019-05-23 19:59:40 +03:00
Sami Ahovainio 5b46fbd878 Added multi_ref_idx variable for intra coding (is 0 throughout the code for now). Modified prediction flag writing. Chroma pred flag remains unchanged (ToDo). Added bitstream debug printing on VERBOSE mode. 2019-05-21 12:28:05 +03:00
Sami Ahovainio ed4e218702 Updated coefficient coding to match VTM 5.0 2019-05-13 15:30:43 +03:00
Sami Ahovainio 504c3dfd1b Modified the headers to match current VTM headers 2019-05-07 16:30:06 +03:00
Marko Viitanen 30a8a7b97c WIP fixing the last significant xy coding 2019-05-07 15:01:02 +03:00
Pauli Oikkonen 87a9208db8 Eliminate cvtsi64_si128 intrinsic
Apparently it'll cause Win32 builds to break because it emits the movq
instruction or something..
2019-04-17 16:30:40 +03:00
Pauli Oikkonen 7175d20bb2 Still include stdint.h for non-vector builds 2019-04-15 19:36:01 +03:00
Pauli Oikkonen 1315c7e2b0 Do not compile any vector code for non-SSE4/AVX2 builds 2019-04-15 19:10:48 +03:00
Pauli Oikkonen f5f70e7bc5 Merge branch 'sad-optimization' 2019-04-15 19:02:01 +03:00
Jan Beich 85f46e17a9 Detect AltiVec via elf_aux_info() on FreeBSD 12+ 2019-04-01 13:08:04 +00:00
Jan Beich 82486255da Simplify AltiVec detection on Linux 2019-04-01 13:08:04 +00:00
Marko Viitanen 1546acfdb9 New NAL unit IDs and header changes 2019-03-28 10:11:36 +02:00
Marko Viitanen 36eab9c170 New cabac context models with "rate" 2019-03-27 12:38:19 +02:00
Marko Viitanen 3bdc8ac8d3 Fix intra_chroma_pred_mode and cbf contexts 2019-03-26 09:10:09 +02:00
Marko Viitanen d15f58517f Changed intra coding to use 6 MPM, implemented merge sort and MPM selection 2019-03-20 15:20:31 +02:00
Marko Viitanen 1081336868 Updated intra pred mode init values 2019-03-20 15:18:32 +02:00
Marko Viitanen f3acd245ae New cabac coding function: kvz_cabac_encode_trunc_bin 2019-03-20 15:17:54 +02:00
Marko Viitanen 80d6e4bf05 New split flag calculations 2019-03-20 09:07:58 +02:00
Marko Viitanen 8c84348010 New entropy bit table 2019-03-20 09:07:22 +02:00
Marko Viitanen 2d0348aa6d New context models 2019-03-20 09:06:57 +02:00
Marko Viitanen 052080747e New CABAC functions 2019-03-20 09:06:26 +02:00
Marko Viitanen 20667fdba6 Update header bits to VTM 4.0+ 2019-03-11 14:02:12 +02:00
Pauli Oikkonen 6d43759604 Create a border-respecting 32-wide AVX hor_sad 2019-03-07 18:01:22 +02:00
Pauli Oikkonen f218cecb38 Remove offending hor_sad_avx2_w32 function
Consider possibly creating a non-offending AVX2 version instead, the
way hor_sad_sse41_w32 works. Or maybe there's more essential work to
do.
2019-03-05 22:51:41 +02:00
Pauli Oikkonen df2e6c54fd 4-unroll hor_sad_sse41_arbitrary
This may not increase perf though because it's so rarely used
function, so keeping icache footprint may be more essential...
2019-03-05 22:45:23 +02:00
Pauli Oikkonen 448eacba7b Avoid overreading block borders in hor_sad_sse41_arbitrary 2019-03-05 22:34:50 +02:00
Eemeli Kallio c159e275b7 Merge branch 'max_merge' 2019-03-05 14:39:03 +02:00
Pauli Oikkonen 41f51c08c4 Avoid overrunning buffer in hor_sad_sse41_w32 2019-03-01 15:37:38 +02:00
Pauli Oikkonen bcd9879359 Include quant coeff range check in non-scaling list execution path too 2019-02-27 17:26:44 +02:00
Pauli Oikkonen 24e6363f64 Remove the kvz_quant_avx2 wrapper function 2019-02-27 16:32:58 +02:00
Pauli Oikkonen 748820f3c5 Eliminate unnecessary loading of coeffs if scaling lists are off 2019-02-27 16:26:35 +02:00
Pauli Oikkonen 5994350f40 Allow quant_flat_avx2 to be used with scaling lists on 2019-02-27 16:25:59 +02:00
Eemeli Kallio 7f4e0acf41 Added check if max-merge is out of bounds 2019-02-19 13:53:42 +02:00
Pauli Oikkonen 9b0e079262 Use SSE instructions for 64-bit SADs instead of MMX
VC++ seems to choke on MMX instructions
2019-02-18 20:13:33 +02:00
Pauli Oikkonen d8b8923028 Add LGPL notices to reg_sad headers 2019-02-18 17:52:47 +02:00
Eemeli Kallio 2a40560888 some variables to const 2019-02-12 11:24:10 +02:00
Eemeli Kallio 8f8e7bb53c Added possibility to reduce number of maximum number of merge candidates. 2019-02-12 09:21:03 +02:00
Marko Viitanen 1165219842 Update PTL, SPS ext and SPS flags to match VTM 4rc1 2019-02-07 10:00:04 +02:00
Pauli Oikkonen 770db825b9 Create hor_sad_w8 and w4 epol mask the way w16 works 2019-02-06 19:34:26 +02:00
Pauli Oikkonen aa19bcac8a Avoid branching in creating shuffle mask in hor_sad_w16 2019-02-06 18:58:46 +02:00
Pauli Oikkonen 2d05ca8520 Remove width from constant-width hor_sad func params
They should kinda know it already
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 57db234d95 Move 32-wide SSE4.1 hor_sad to picture-sse41.c
It's not used by picture-avx2.c that also includes the header, so
it should not be in the header
2019-02-04 20:41:40 +02:00
Pauli Oikkonen dd7d989a39 Implement 32-wide hor_sad on AVX2 2019-02-04 20:41:40 +02:00
Pauli Oikkonen ff70c8a5ec Utilize horizontal SAD functions for SSE4.1 as well 2019-02-04 20:41:40 +02:00
Pauli Oikkonen f5ff4db01f 4-wide hor_sad border agnostic 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 35e7f9a700 Fix hor_sad w8 to work with both borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 836783dd6e Use hor_sad_w32 for both left and right borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 69687c8d24 Modify hor_sad_sse41_w16 to work over left and right borders 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 51c2abe99a Modify image_interpolated_sad to use kvz_hor_sad 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 1e0eb1af30 Add generic strategy for hor_sad'ing an non-split width block 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 686fb2c957 Unroll arbitrary-width SSE4.1 hor_sad by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 768203a2de First version of arbitrary-width SSE4.1 hor_sad 2019-02-04 20:41:40 +02:00
Pauli Oikkonen ccf683b9b6 Start work on left and right border aware hor_sad
Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point
investigate if this can start to thrash icache
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 760bd0397d Pad the image buffer by 64 bytes from both ends
This will be necessary for an efficient and straightforward
implementation of hor_sad for blocks over 16 pixels wide, because they
cannot use the shuffle trick because inter-lane shuffling is so hard to
do
2019-02-04 20:41:40 +02:00
Pauli Oikkonen c36482a11a Fix bug in 24-wide SAD
*facepalm*
2019-02-04 20:41:40 +02:00
Pauli Oikkonen f781dc31f0 Create strategy for ver_sad
Easy to vectorize
2019-02-04 20:41:40 +02:00
Pauli Oikkonen ca94ae9529 Handle extrapolated blocks with unmodified width using optimized_sad pointer 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 91b30c7064 Tidy up kvz_image_calc_sad 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 9db0a1bcda Create get_optimized_sad func for SSE4.1 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 91380729b1 Add generic get_optimized_sad implementation
NOTE: To force generic SAD implementation on devices supporting
vectorized variants, you now have to override both get_optimized_sad
and reg_sad to generic (only overriding get_optimized_sad on AVX2
hardware would just run all SAD blocks through reg_sad_avx2). Let's
see if there's a more sensible way to do it, but it's not trivial.
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 45f36645a6 Move choosing of tailored SAD function higher up the calling chain 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 91cb0fbd45 Create strategy for directly obtaining pointer to constant-width SAD function 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 94035be342 Unify unrolling naming conventions 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 517a4338f6 Unroll SSE SAD for 8-wide blocks to process 4 lines at once 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 0f665b28f6 Unroll arbitrary width SSE4.1 SAD by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen cbca3347b5 Unroll 64-wide AVX2 SAD by 2 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 84cf771dea Unroll 32 and 16 wide SAD vector implementations by 4 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 5df5c5f8a4 Cast all pointers to const types in vector SAD funcs
Also tidy up the pointer arithmetic
2019-02-04 20:41:40 +02:00
Pauli Oikkonen a711ce3df5 Inline fixed width vectorized SAD functions 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 6504145cce Remove 16-pixel wide AVX2 SAD implementation
At least on Skylake, it's noticeably slower than the very simple
version using SSE4.1
2019-02-04 20:41:40 +02:00
Pauli Oikkonen 4cb371184b Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 796568d9cc Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 4d45d828fa Use constant-width SSE4.1 SAD funcs for AVX2 2019-02-04 20:41:40 +02:00
Pauli Oikkonen 2eaa7bc9d2 Move SSE4.1 SAD functions to separate header 2019-02-04 20:41:40 +02:00
Pauli Oikkonen d2db0086e1 Create constant width SAD versions for 8 and 16 pixels 2019-02-04 20:41:40 +02:00
Pauli Oikkonen a13fc51003 Include a blank AVX2 strategy registration function even in non-AVX2 builds 2019-02-04 19:52:24 +02:00
Pauli Oikkonen d55414db66 Only build AVX2 coeff encoding when supported
..whoops
2019-02-04 19:34:30 +02:00
Pauli Oikkonen 3fe2f29456 Merge branch 'encode-coeffs-avx2' 2019-02-04 18:52:31 +02:00
Pauli Oikkonen 722b738888 Fix more naming issues 2019-02-04 16:05:43 +02:00
Pauli Oikkonen e26d98fb75 Rename a couple variables and add crucial comments 2019-02-04 15:57:07 +02:00
Pauli Oikkonen f186455619 Move encode_last_significant_xy out of strategy modules
It's the exact same in both AVX2 and generic, and does not seem to
be worth even trying to vectorize
2019-02-04 14:55:41 +02:00
Pauli Oikkonen 3f7340c932 Fine-tune pack_16x16b_to_16x2b
Avoid mm_set1 operation when it's possible to create the constant with
one bit-shift operation from another instead. Thanks Intel for
3-operand instruction encoding!
2019-02-04 14:44:47 +02:00
Pauli Oikkonen 314f5b0e1f Rename 16x2b cmpgt function, comment it better, optimize it slightly
Eliminate an unnecessary bit masking to make it even more messy
2019-02-04 14:44:32 +02:00
Pauli Oikkonen d8ff6a6459 Fix _andn_u32 to work on old Visual Studio 2019-02-01 15:34:42 +02:00
Pauli Oikkonen 26e1b2c783 Use (u)int32_t instead of (unsigned) int in reg_sad_sse41 2019-01-10 14:37:04 +02:00
Pauli Oikkonen 3a1f2eb752 Prefer SSE4.1 implementation of SAD over AVX2
It seems that the 128-bit wide version consistently outperforms the
256-bit one
2019-01-10 13:48:55 +02:00
Pauli Oikkonen 9b24d81c6a Use SSE instead of AVX for small widths
Highly dubious if this will help performance at all
2019-01-07 20:12:13 +02:00
Pauli Oikkonen b2176bf72a Optimize SSE4.1 version of SAD
Make it use the same vblend trick as AVX2. Interestingly, on my test
setup this seems to be faster than the same code using 256-bit AVX
vectors.
2019-01-07 19:40:57 +02:00
Pauli Oikkonen 887d7700a8 Modify AVX2 SAD to mask data by byte granularity in AVX registers
Avoids using any SAD calculations narrower than 256 bits, and
simplifies the code. Also improves execution speed
2019-01-07 18:53:15 +02:00
Pauli Oikkonen 7585f79a71 AVX2-ize SAD calculation
Performance is no better than SSE though
2019-01-07 16:26:24 +02:00
Pauli Oikkonen ab3dc58df6 Copy SAD SSE4.1 impl to AVX2 2019-01-03 18:31:57 +02:00
Pauli Oikkonen 45ac6e6d03 Tidy pack_16x16b_to_16x2b comments 2019-01-03 16:37:05 +02:00
Ari Lemmetti cd818db724 Add missing quantization and residual in cost calculation (inter rd=2). 2018-12-21 15:55:29 +02:00
Pauli Oikkonen 016eb014ad Move packing 16x16b -> 16x2b into separate function 2018-12-20 10:51:44 +02:00
Ari Lemmetti b234897e8a Fix smp and amp blocks in fme and revert previous change.
Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc.
Calculate SATD on the 8x4, ... part
2018-12-19 21:30:53 +02:00
Pauli Oikkonen 9aaa6f260d Fixes to enable portability 2018-12-18 20:42:09 +02:00
Pauli Oikkonen 2fdbbe9730 Move CG reordering code from quant-avx2 to shared header 2018-12-18 19:42:18 +02:00
Pauli Oikkonen d02207306d Create a header file for shared AVX2 code 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 361bf0c7db Precompute >=2 coeff encoding loop with 2-bit arithmetic
Who needs 16x16b vectors when you can do practically the same with
16x2b pseudovectors in 32-bit general purpose registers!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen 940b0e9e6a Require BMI2 for AVX2 build
Any processor implementing AVX2 should also implement BMI2
2018-12-18 19:41:09 +02:00
Pauli Oikkonen f66cb23d5b Optimize greater1 encoding loop
Calculating the c1 variable need not be a serial operation!
2018-12-18 19:41:09 +02:00
Pauli Oikkonen 8c8b791c35 Vectorize kvz_context_get_sig_ctx_inc 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 033261eb74 Eliminate two branches using bit magic 2018-12-18 19:41:09 +02:00
Pauli Oikkonen c4434e8d04 Scan CG's in forward order to simplify finding last significant 2018-12-18 19:41:09 +02:00
Pauli Oikkonen efd097f5a5 Vectorize the coeff group loop to some extent 2018-12-18 19:41:09 +02:00
Pauli Oikkonen a01362e638 use the efficient method of reordering raster->scan 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 50a888e789 Use the efficient method to find first and last nz coeffs in block 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 7e9203f566 Scan coeff groups in scan order to help find last significant one 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 9a5a6fdbc7 Simplify two ifs in encode_coeff_nxn-avx2 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 37a2a8bac8 See if loop can be optimized by rearranging 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 584f2f74b6 Vectorize significant coeff group scanning loop 2018-12-18 19:41:09 +02:00
Pauli Oikkonen 1bfed73221 Add AVX2 strategy for encode_coding_tree 2018-12-18 19:41:09 +02:00
Pauli Oikkonen c3a6f3112a Add generic strategy group for encode_coding_tree 2018-12-18 19:41:09 +02:00
Marko Viitanen 1ef851ab4b Disable FME on amp/smp blocks with width or height not divisible by 8 2018-12-18 10:28:21 +02:00
Joose Sainio b71c5573f0 Merge branch 'rate_control_fix' 2018-12-17 12:39:27 +02:00
Sergei Trofimovich 68a70e45a1 x86 asm: mark stack as non-executable
Gentoo's `scanelf` QA tool detects writable/executable stack
of assembly-writtent files as:

```
$ scanelf -qRa  .
 0644 LE !WX --- ---     ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o
 0644 LE !WX --- ---     ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o
 0644 LE !WX --- ---     ./src/strategies/x86_asm/picture-x86-asm-sad.o
 0644 LE !WX --- ---     ./src/strategies/x86_asm/picture-x86-asm-satd.o
```

Normally C compiler emits non-executable stack marking (or GNU assembler
via `-Wa,--noexecstack`).

The change adds non-executable stack marking for yasm-based assmbly files.

https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details.

Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
2018-12-16 11:31:56 +00:00
Reima Hyvönen 1fcc5c6a8d Merge branch 'bipred_recon' 2018-12-11 09:59:35 +02:00
Reima Hyvönen e4a10880f3 Added case 12 to bipred_recon no mov 2018-12-11 09:52:17 +02:00
Marko Viitanen a4f3968e52 Fix Visual Studio errors by initializing some variables used in AVX2 signhiding 2018-12-11 09:33:26 +02:00
Ari Lemmetti ac943147e3 Calculate satd cost for whole non-square blocks as well. 2018-12-10 17:04:29 +02:00
Pauli Oikkonen c465578048 Add a descriptive comment to coefficient reordering 2018-12-03 15:36:32 +02:00
Pauli Oikkonen f78bf2ebcb Optimize q_coefs usage for indexed fetch 2018-12-03 15:36:32 +02:00
Pauli Oikkonen d9591f1b49 Eliminate midway buffering of reordered coefs
TODO: For some mysterious reason seems slightly slower than the
buffered one
2018-12-03 15:36:32 +02:00
Pauli Oikkonen 7fe454c51f Optimize get_cheapest_alternative() 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 6bbd3e5a44 Optimize rearrange_512 function 2018-12-03 15:36:32 +02:00
Pauli Oikkonen cb8209d1b3 Vectorize transform coefficient reordering loop 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 7cf4c7ae5f Rename "reduce" functions to hsum
That's what the functions fundamendally do anyway
2018-12-03 15:36:32 +02:00
Pauli Oikkonen 316cd8a846 Fix ALIGNED keyword and grow alignment to 64B 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 1befc69a4c Implement sign bit hiding in AVX2 2018-12-03 15:36:32 +02:00
Pauli Oikkonen c5cd03497e Require BMI and ABM instruction sets for AVX2 build
AVX2 support on a processor should always imply BMI and ABM support.
The lzcnt and tzcnt instructions have more suitable semantics in the
corner case that source word is 0, and allow us to even handle that
scenario without a branch. Apparently Visual Studio will already
include this support when building with AVX2 enabled, so only the
automake files need to be tweaked.
2018-12-03 15:36:32 +02:00
Reima Hyvönen f8696b54a4 Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12) 2018-11-20 17:09:19 +02:00
Marko Viitanen a5a10a33c3 Enable --scaling-list parameter and add to the documentation 2018-11-19 10:47:30 +02:00
Reima Hyvönen 710ba288db Chroma has some problems 2018-11-15 16:42:48 +02:00
Sami Ahovainio 8f98d4aac7 Added square search 2018-11-14 14:50:31 +02:00
Marko Viitanen 6871490dd5 Simplify get_mvd_coding_cost(), only include golomb coding 2018-11-14 14:33:31 +02:00
Ari Lemmetti a832206bb6 Replace 32-bit incompatible instrinsics 2018-11-12 18:54:33 +02:00
Ari Lemmetti 5c774c4105 Rewrite most of FME and interpolation filters
Changes had to break a lot of stuff and were just squashed into this horrible code dump
2018-11-08 20:21:16 +02:00
Joose Sainio 1c8a1f24e2 Don't assume anything about bits spent 2018-11-07 16:03:38 +02:00
Joose Sainio 3471e2470d Fix using uninitialized value for the first frame 2018-11-07 08:17:39 +02:00
Joose Sainio d95ac11a3b Fix rate_control for other LP-GOPS 2018-11-06 14:20:44 +02:00
Joose Sainio 67a6ba667e Fix rate control for flat lp-gop 2018-11-06 09:38:17 +02:00
Reima Hyvönen 7406c33a42 Some more cleaning 2018-10-26 12:25:18 +03:00
Reima Hyvönen 4c71546b2e Cleaned some coding 2018-10-26 12:19:44 +03:00
Reima Hyvönen 4fe3909e48 Switched luma to use 32bits size ints intstead of 16bit size 2018-10-24 18:24:46 +03:00
Marko Viitanen 465bc2cfee [EMT] make functions static and prefix arrays with kvz_g 2018-10-18 10:54:33 +03:00
Marko Viitanen b133e7de1e VTM 2.2 changed -> remove high_precision_motion_vectors flag 2018-10-17 12:41:14 +03:00
Marko Viitanen 169febd1c4 [EMT] Simplify DCT8, DCT5, DST1 and DST7 definitions 2018-10-17 12:17:54 +03:00
Marko Viitanen e015d7eb2b Fix compiler warnings 2018-10-17 10:43:11 +03:00
Marko Viitanen ad310c77d3 Added EMT transforms to the strategies 2018-10-17 08:56:49 +03:00
Eemeli Kallio 284e73839e Calculating zero cost moved to its own function 2018-10-16 11:02:01 +03:00
Reima Hyvönen 381e786e10 Trying to find the bug in luma 2018-10-11 18:08:41 +03:00
Marko Viitanen c589e5ed36 Fix closed-gop frame feed, the ordering was incorrect after the first GOP 2018-10-10 11:12:03 +03:00
Reima Hyvönen 2f5f81bac3 removed the non-optimated bipred function 2018-10-09 11:19:23 +03:00
Marko Viitanen 75dce4f3ce Fix low-delay-gop usage with --no-open-gop 2018-10-04 15:16:02 +03:00
Marko Viitanen de71b58f76 Change closed GOP structure to include an additional IDR between GOPs 2018-10-04 11:17:03 +03:00
Marko Viitanen 1e1a80e4a6 [TMVP] fix clamping of block offsets and clean up the code a bit 2018-10-03 12:34:48 +03:00
Reima Hyvönen 212a8e68fa Modified to avoid memory overflow, still some bug inside luma 2018-10-02 20:23:32 +03:00
Marko Viitanen 954f07e3d7 Add --(no-)open-gop option 2018-10-02 10:05:32 +03:00
Marko Viitanen 027359c3c3 Implement TMVP duplicate checking as in VTM 2.1 2018-09-28 11:50:36 +03:00
Marko Viitanen 571a545416 Fix spatial merge candidate selection 2018-09-26 15:10:31 +03:00
Marko Viitanen 63760ca0cf Use kvz_cabac_bins_verbose flag to control cabac debug printing 2018-09-26 12:01:23 +03:00
Marko Viitanen 7c37f456f9 Fix implicit Qt split for p-frames 2018-09-26 12:00:18 +03:00
Marko Viitanen b6f2c66c73 Fixed intra Most Probable Mode (mpm) derivation to conform VTM 2.1 2018-09-21 10:33:54 +03:00
Sami Ahovainio a2b2275d87 Fixed array sizes in search_intra_rough from 35 to 67 2018-09-18 11:49:15 +03:00
Sami Ahovainio 82fb80ab6e Fixed couple of if-clauses which still used the old intra mode range. 2018-09-17 08:56:43 +03:00
Marko Viitanen a437d4c508 Fixed intra chroma mode bitstream writing (chroma search not used) 2018-09-13 15:05:00 +03:00
Marko Viitanen 389aeebe07 Added 2x2 transform functions 2018-09-13 14:51:07 +03:00
Marko Viitanen 445c059b4a Fix transforms for VTM 2.0, generated new transform matrices and added a shift by 2 for forward and inverse 2018-09-13 14:39:49 +03:00
Marko Viitanen 35fa8e9785 Fix kvz_intra_get_dir_luma_predictor -> Intra working 2018-09-13 12:32:17 +03:00
Marko Viitanen f75b0b11c3 Simplify intra filtered ref pixel selection 2018-09-13 10:09:52 +03:00
Sami Ahovainio 4bb484a86a Fixed if-clause at search_intra.c to use new wider range of intra modes 2018-09-13 09:58:48 +03:00
Marko Viitanen 82de0fbee7 Switch intra search to use the actual 67 modes 2018-09-13 09:43:45 +03:00
Marko Viitanen 382917bcd3 New table for choosing angular intra filtered references and a small bugfix on the end condition of angular intra 2018-09-13 09:35:55 +03:00
Marko Viitanen 4aad2fa383 Fix intra mode writing 2018-09-12 10:34:58 +03:00
Marko Viitanen d4ed0ee3ad Fixed some array offsets in intra angular prediction 2018-09-12 08:53:17 +03:00
Marko Viitanen 20c96366ed fix kvz_context_get_sig_ctx_idx_abs() parameter for "type" -> decoding with VVC 2018-09-10 12:51:02 +03:00
Marko Viitanen a7ca09108c Improve CABAC debugging by including similar info as in VTM 2018-09-10 11:00:00 +03:00
Sami Ahovainio ce84407c69 Fixed coeff_remain writing to use the correct rice_param instead of using 0 all the time. 2018-09-07 11:24:24 +03:00
Sami Ahovainio 78ea24bcf1 Fixed sig_coeff_flag writing condition. 2018-09-06 15:48:45 +03:00
Marko Viitanen 4bebb4bb2c Fix temp_diag and temp_sum initialization and coeff array usage in context derivation 2018-09-05 17:09:50 +03:00
Marko Viitanen f5b6c386bc Fix incorrect sig_flag implicity parameters and some temp variable initializations 2018-09-03 16:22:05 +03:00
Marko Viitanen 8bef85e056 Merge branch 'set-qp-in-cu' 2018-09-03 08:33:33 +03:00
Ari Lemmetti 2fdcc2b79d Add option --set-qp-in-cu 2018-09-03 08:32:45 +03:00
Marko Viitanen 52be2f0bbe Fixed kvz_encode_coeff_nxn and renamed some variables to match VTM 2018-08-31 15:10:17 +03:00
Sami Ahovainio 787264f568 Fixed dst indexing in kvz_angular_pred_generic 2018-08-31 10:36:28 +03:00
Sami Ahovainio d2291fea83 Intra mode scaling moved from angular prediction to kvz_intra_predict. pdpc implemented in kvz_intra_predict. 2018-08-31 10:01:28 +03:00
Marko Viitanen 49a116ed3a Bugfix correct array sizes for cu_ctx_last_x/y 2018-08-30 16:14:08 +03:00
Sami Ahovainio 84cef127dc Fixed cu_gtx_flag_model_chroma initialization. 2018-08-30 15:21:16 +03:00
Marko Viitanen 7d491e639b Add new values to last_x/y coding 2018-08-30 15:04:04 +03:00
Marko Viitanen 809805b185 Bugfixes for kvz_encode_coeff_nxn() 2018-08-30 14:50:29 +03:00
Marko Viitanen 0680f240d7 Converted kvz_encode_coeff_nxn and related helper functions to VVC K0072 format 2018-08-30 14:24:03 +03:00
Marko Viitanen 84e78c6c50 Disable writing of cabac flags not currently available 2018-08-30 11:21:44 +03:00
Marko Viitanen e3dbaf99a9 Started implementing new coeff coding function
- added kvz_context_get_sig_ctx_idx_abs for abs sig context derivation
2018-08-30 11:09:42 +03:00
Marko Viitanen e00319b832 Fix cu_sig_coeff_group_model init and some instances of cu_sig_model usage 2018-08-30 09:08:08 +03:00
Marko Viitanen 4429e0b89d Expand cu_sig_coeff_group_model according to VVC 2018-08-29 16:20:34 +03:00
Sami Ahovainio 578122ed43 Context changes for chroma pred modes. BT flag init and chroma pred mode init moved inside a loop. 2018-08-29 16:00:08 +03:00
Sami Ahovainio 54ebadfc43 Clarifying comments and changes towards WAIP 2018-08-29 16:00:08 +03:00
Marko Viitanen 7f119e8bdd Added new ctx models for sig, parity and gtx, removed models for one and abs 2018-08-29 15:57:40 +03:00
Marko Viitanen 46d02c1734 Implemented JVET-K0072 based cbf context selections 2018-08-29 10:12:07 +03:00
Marko Viitanen bb9dc22336 Disable PCM 2018-08-29 09:59:53 +03:00
Marko Viitanen 23a1292f52 Added max_binary_tree_unit_size and more comments 2018-08-29 08:23:41 +03:00
Marko Viitanen 37caa451c6 Fix VVC split flag condition for hor and ver splits at the edges
- Split flag is no longer implicit when the block can be split with the BT after QT in horizontal or vertical way
2018-08-28 16:03:02 +03:00
Reima Hyvönen 896034b7cf Some renamed functions back 2018-08-28 15:31:10 +03:00
Reima Hyvönen e8b5e6db4c Did some merging 2018-08-28 15:26:27 +03:00
Reima Hyvönen 7de5c74434 Updated bipred_recon to work faster 2018-08-28 15:12:31 +03:00
Reima Hyvönen 47b357cca2 Comment one test 2018-08-27 18:52:14 +03:00
Reima Hyvönen 2ca99a44e8 Updated shuffle operation to be in right order 2018-08-27 18:16:38 +03:00
Sami Ahovainio 42741a2c40 Some changes for PCM and Intra towards VTM 2.0 compatibility. 2018-08-27 09:18:15 +03:00
Marko Viitanen 3dc5f65fba Add an extra bit to intra mode and map 33 angular modes to 65 2018-08-17 15:09:48 +03:00
Marko Viitanen 9aaf53fcd7 Add dep_quant_enable_flag to slice header 2018-08-17 14:58:57 +03:00
Marko Viitanen dc92fa6fb3 Added missing ALF flag to SPS 2018-08-17 12:53:27 +03:00
Marko Viitanen dbc74c592d Add VTM 2.0 new flags to SPS 2018-08-17 12:47:29 +03:00
Marko Viitanen 17505c8306 Disable vertical and horizontal scan order with small blocks
- Intra now working down to 8x8 luma
2018-08-17 11:38:40 +03:00
Marko Viitanen 4f7da86285 Commented out sign hiding code, which is not used in VVC 2018-08-17 09:38:11 +03:00
Marko Viitanen c9cbdd5dc3 Added couple of ToDo comments for large CTU support 2018-08-17 09:37:14 +03:00
Marko Viitanen daf041406f Disable DST 2018-08-16 16:05:32 +03:00
Marko Viitanen b85ae3688e Signal QP in slice header if tiles and slices=tiles are enabled
Keeps the PPS constant for various purposes
2018-08-16 08:44:39 +03:00
Sami Ahovainio 5baab86597 Added BT split flags 2018-08-14 15:28:06 +03:00
Marko Viitanen b33aa37484 Enable max_trans_hier_depth values and disable DC and angular filtering 2018-08-14 15:24:21 +03:00
Marko Viitanen 00a827007a Use normal split flags 2018-08-14 10:57:32 +03:00
Reima Hyvönen 508b218a12 some modifications made to prevent reading too much 2018-08-14 10:50:39 +03:00
Reima Hyvönen 1d935ee888 some useless stuff removed 2018-08-13 16:47:11 +03:00
Reima Hyvönen ce3ac4c05e some modifications to no_mov 2018-08-13 16:41:02 +03:00
Reima Hyvönen 15a613ae94 test if no_mov breaks testing 2018-08-13 16:02:56 +03:00
Reima Hyvönen 97a2049e58 removed pointer declaration out from switch 2018-08-10 16:42:26 +03:00
Reima Hyvönen aa94bcedbc Stream is now pointer 2018-08-10 16:38:49 +03:00
Reima Hyvönen fa5b227ece 256 to 32 doesn't work, made them by hand 2018-08-10 16:01:20 +03:00
Reima Hyvönen 408dedbcc8 removed _mm256_extract_epi8 and replaced with _mm_stream 2018-08-10 15:53:26 +03:00
Reima Hyvönen 31c35091c6 _mm256_cvtsi256_si32 removed 2018-08-10 10:06:40 +03:00
Reima Hyvönen 99dc43074f _mm256_cvtsi256_si32 breaks system, too much bits. back to extract 2018-08-10 09:59:33 +03:00
Reima Hyvönen 4f1f80b2cb Transformed convert from 256 to cast 256 -> 128 and then convert from 128 2018-08-09 15:35:54 +03:00
Reima Hyvönen 4957555eb3 Removed leftover from 939 2018-08-09 15:25:03 +03:00
Reima Hyvönen 28b165c971 Clearified some sections, added _MM_SHUFFLE macro 2018-08-09 15:23:01 +03:00
Reima Hyvönen dd04df8667 testing if error in both avx2 functions 2018-08-03 11:49:00 +03:00
Reima Hyvönen ed50d71fde Switched some variables to different location, altered inter_recon_bipred_avx2 function 2018-08-02 16:08:59 +03:00
Reima Hyvönen f5739a0028 Renaming and removing useless prints 2018-08-02 14:47:17 +03:00
Reima Hyvönen bc09f59bb6 Edited some definitions 2018-08-02 11:54:53 +03:00
Marko Viitanen ffbc178cf9 An attempt to fix checksums 2018-07-27 14:38:05 +03:00
Marko Viitanen 84b6a61193 Hack to fix split flag model for PCM use -> valid VVC bitstream 2018-07-27 14:29:31 +03:00
Marko Viitanen 90174f1143 Add more values to cabac debugging 2018-07-27 13:59:54 +03:00
Marko Viitanen c6572d644f Updated split_flag initialization to support Large CTUs in VVC 2018-07-27 12:32:45 +03:00
Marko Viitanen 7abadaafe4 Disable CTU splitting and configure max CTU sizes to 64x64 2018-07-27 11:04:21 +03:00
Marko Viitanen 6921e31502 Fix debugging functions 2018-07-27 11:03:16 +03:00
Marko Viitanen 37b5ce3d33 Change configurations to ease VVC debugging, max-BT-depth = 0 2018-07-26 16:12:11 +03:00
Marko Viitanen 792da1b7e0 Force PCM coding and fix PCM sample output 2018-07-26 11:05:31 +03:00
Marko Viitanen 5d4a2a004f Remove depentent slice, wpp/tile and scaling list parameters from PPS 2018-07-26 10:43:21 +03:00
Marko Viitanen 31a6cbfe6d Disable sign bit hiding 2018-07-26 10:41:35 +03:00
Marko Viitanen 9f2b429c66 Disable some features not used in VVC
- Part mode coding not used
 - split transform flag not used
 - last significant coeff pos swapping not used
2018-07-26 10:33:27 +03:00
Marko Viitanen e84276f7f6 Fixed version string 2018-07-26 08:17:55 +03:00
Marko Viitanen e38109d102 Enable QTBT and set correct general_profile_idc for Next 2018-07-25 12:24:17 +03:00
Marko Viitanen 079ca9b8b2 Disable tile/wpp flags in slice header 2018-07-25 11:19:53 +03:00
Marko Viitanen b0ac7002e5 Disable VPS 2018-07-25 11:02:09 +03:00
Marko Viitanen c5bf6a3774 Bugfix: add missing parameters to WRITE_U 2018-07-25 10:18:48 +03:00
Marko Viitanen 9befe35961 Modify slice header to conform VVC 2018-07-25 10:17:42 +03:00
Marko Viitanen 95ce1e1a25 Modify parameter sets to conform VVC 2018-07-25 10:05:11 +03:00
Arttu Ylä-Outinen 83555c3d6d Enable --fast-residual-cost with fastest presets 2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen c438bb4a19 Add an option to skip CABAC for residual costs
Adds command line option --fast-residual-cost=<limit>. When QP is below
the limit, estimates the cost of coding the residual coefficients from
the sum of absolute coefficients. Skipping CABAC is not worth it with
high QPs because there are fewer coefficients so CABAC is not as slow.
2018-07-16 12:31:20 +03:00
Reima Hyvönen a4bf77f208 Tested some extract functions 2018-07-12 09:29:32 +03:00
Reima Hyvönen c05033a893 Even more useless vectors removed 2018-07-11 15:09:14 +03:00
Reima Hyvönen 884cb77238 Removed some not used vectors 2018-07-11 15:06:11 +03:00
Reima Hyvönen 792689a5ff Removed for-loops, added extract instead 2018-07-11 14:56:41 +03:00
Reima Hyvönen f9c7f6ee66 Added some break-operations for avx2 optimation 2018-07-11 14:15:38 +03:00
Reima Hyvönen cc064da143 some more optimation for bipred 2018-07-11 11:27:54 +03:00
Reima Hyvönen 9a339eef89 Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD
# Conflicts:
#	build/kvazaar_lib/kvazaar_lib.vcxproj
2018-07-10 16:21:04 +03:00
Reima Hyvönen a22cf03ddb Updated to have no movement function to avx2 strategies 2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen b7474eb532 Fix SAO buffer sizes
Increases sizes of buffers used for SAO reconstruction to avoid stack
buffer overflow in AVX2 SAO reconstruction.
2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen b37470e80f
Merge pull request #207 from jbeich/maltivec
Unbreak build on PowerPC if AltiVec isn't supported
2018-07-04 11:06:41 +03:00
Reima Hyvönen ea83ae45f0 Toimiva ratkaisu 2018-07-03 11:18:51 +03:00
Jan Beich 4f4bea7496 Check -maltivec is supported before using
PowerPC target may lack or have non-standard FPU:

$ cc -dumpmachine
powerpcspe-undermydesk-freebsd
$ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c
src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist
2018-07-02 23:25:23 +00:00
Jan Beich b892d820f8 Clean up macOS includes on powerpc* after 93e1c9f1c3
strategyselector.c:426:25: machine/cpu.h: No such file or directory
2018-07-02 21:52:45 +00:00
Reima Hyvönen 17babfffa4 25.6 working optimation, ~50% faster than original 2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen 2f995f4325
Merge pull request #205 from jbeich/powerpc
Unbreak build on non-Linux powerpc*
2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen c1398ef818 Permit --period=1 with any GOP structure
All intra coding is a special case so it can be permitted even though
Kvazaar normally only supports intra periods that are divisible by the
GOP length.
2018-06-18 12:26:11 +03:00
Arttu Ylä-Outinen abdebe0bf9 Fix --owf help message
The number of parallel frames is --owf plus one, not --owf minus one.

Fixes #204.
2018-06-18 09:33:36 +03:00
Jan Beich 93e1c9f1c3 Add AltiVec detection for BSDs
strategyselector.c:377:26: linux/auxvec.h: No such file or directory
2018-06-17 15:38:24 +00:00
Miika Metsoila 98972d26c2 Document that the high tier requires level 4 or higher 2018-06-14 12:41:03 +03:00
Miika Metsoila 62b44efaa4 Write the encoding tier (main/high) into the bitstream 2018-06-14 12:41:03 +03:00
Arttu Ylä-Outinen a343f6d587 Prepare for delta QPs at CU-level
- Replaces lcu_dqp_enabled with max_qp_delta_depth in encoder_control_t.
- Fixes set_cu_qps so that it can handle quantization groups of
  arbitrary size.
- Fixes computation of QP predictors so that it works for quantization
  groups of arbitrary size.
2018-06-13 15:36:19 +03:00
Arttu Ylä-Outinen dc6b2024ea Modify reference count asserts to fix data races
Changes asserts on the reference count of objects to assert the value
after KVZ_ATOMIC_INC instead of directly checking the value. Fixes some
data races detected by TSan.
2018-06-12 09:35:07 +03:00
Ari Lemmetti 4fb1c16c61 Add early termination for intra rdo when a zero coefficient block is found. 2018-06-08 21:03:07 +03:00
Ari Lemmetti 492529fb7a Add the same comment to help message as well... 2018-05-30 14:13:15 +03:00
Ari Lemmetti 0d5972bf03 Add missing sort to intra transform split search so mode at 0 is the best 2018-05-21 13:10:38 +03:00
Sebastien Alaiwan 954bca7d6e Fix memset parameter 2018-05-17 11:24:49 +02:00
Jaakko Laitinen f9466efcbb Close file on error 2018-05-15 11:50:16 +03:00
Reima Hyvönen 9fed29f950 optimation for inter_recon_bipred 2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen 5c585c4fbc Update help message
Updates the default option values to match the medium preset.
2018-04-03 10:40:37 +03:00
Arttu Ylä-Outinen 2b4e22111a Update presets
The new presets are slower but have better coding efficiency.
2018-04-03 10:37:30 +03:00
Arttu Ylä-Outinen 7185519a1b Update command line help
- Adds missing default values.
- Adds help for --crypto and --key.
- Adds help for --rd=3.
- Adds help for --sao options.
- Some changes to help wording.
2018-03-23 14:33:04 +02:00
Arttu Ylä-Outinen 3606860504 Add --no-cpuid option
Equivalent to --cpuid=0.
2018-03-23 12:32:27 +02:00
Arttu Ylä-Outinen fb462b25ef Fix transform skip for inter
The transform skip flag in cu_info_t was stored under the intra
substruct even though transform skip can be used for inter as well. This
caused bitstream errors. Fixed by moving the flag out of the substruct.
2018-03-20 11:01:33 +02:00
Arttu Ylä-Outinen b64e46707d Skip raster scan step in TZ search
Raster scan is very slow and the BD-rate improvement is marginal.
2018-03-01 14:04:03 +02:00
Arttu Ylä-Outinen 6877064230 Add zero neighborhood check to TZ search
Adds an additional grid search step that starts from the zero motion
vector after the normal grid search. The search range for this step is
half of the normal range.
2018-03-01 14:02:13 +02:00
Arttu Ylä-Outinen 74a413c46a Switch to star refinement in TZ search 2018-03-01 13:06:14 +02:00
Arttu Ylä-Outinen ebee428ee1 Add loop termination to TZ grid search
Terminates the grid search if no better motion vector was found in the
last three iterations.
2018-03-01 13:06:06 +02:00
Arttu Ylä-Outinen 4c175621dd Fix TZ grid search and star refinement
- Changes TZ grid search and star refinement to keep the origin constant
  instead of moving to the best position after each iteration.
- Changes star refinement to loop until there is no more improvement,
  instead of running the step only once.
2018-03-01 12:56:57 +02:00
Arttu Ylä-Outinen 9c2d0074a2 Add rounding of motion vectors in inter search
When the starting point for integer motion estimation was selected among
the merge candidates, the candidate motion vectors were always rounded
down. This commit changes the rounding so that they are rounded to the
nearest integer MV instead.
2018-03-01 09:39:21 +02:00
Ari Lemmetti 662430d441 Select CU type based on SSD, transform unit tree and mode cost of luma and chroma on --rd=2 2018-02-22 19:26:48 +02:00
Arttu Ylä-Outinen cb06cfeadb Drop temporary arrays in bipred search
Changes bipred search to use the original source and reconstruction
arrays directly instead of copying them.
2018-02-14 11:20:51 +02:00
Arttu Ylä-Outinen 0ea516ba30 Move bipred search to a separate function 2018-02-14 09:56:53 +02:00
Arttu Ylä-Outinen 6f506be12d Drop dynamic allocation from bipred search
Moves the temporary LCU struct used in bipred search from the heap to
the stack. The single malloc call was a huge bottleneck in bipred.
2018-02-14 09:55:02 +02:00
Arttu Ylä-Outinen 7155dd0db7 Add negative references to L1 list
Changes reference index list creation so that the negative references
are added to L1 in addition to L0 when biprediction is enabled and no
reordering of pictures is done. Biprediction can now be used with the
low-delay GOP structure.
2018-02-07 14:54:52 +02:00
Arttu Ylä-Outinen 4b24cd03a2 Update for crypto++ 6.0.0 compatibility
Changes the crypto module to use unsigned char instead of byte. The byte
typedef is no longer included in the global namespace in crypto++ 6.0.0.
See https://github.com/weidai11/cryptopp/issues/442.

Fixes #184.
2018-02-05 13:35:03 +02:00
Arttu Ylä-Outinen 8c53417006 Check zero coefficient cost for inter
Checks the cost of flushing all coefficients of an inter block to zero.
This is much faster than doing full RDOQ but can still reduce bitrate
significantly. Encoding speed is increased since fewer coefficient bits
have to be coded with CABAC.
2018-01-29 12:41:56 +02:00
Arttu Ylä-Outinen 018b5ffa64 Move inter CU reconstruction to a new function
Moves code for reconstructing all PUs in an inter CU to a new function
kvz_inter_recon_cu in inter.c.
2018-01-24 15:05:39 +02:00
Arttu Ylä-Outinen 405b8c1069 Refactor inter MVD cost functions
Moves duplicate code for writing the MVD of a single motion vector from
kvz_get_mvd_coding_cost_cabac and encoder_inter_prediction_unit to a new
function.
2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen c1cca1ad7f Refactor inter MV candidate selection
Moves duplicate code for checking the best MV candidate from functions
calc_mvd_cost, search_pu_inter_ref and search_pu_inter to a new
function.
2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen 9067aa4535 Remove an unnecessary copy in SMP/AMP search
SMP/AMP search is performed using a lower work tree level than the
normal inter search so the prediction info must be copied up if an
SMP/AMP mode is chosen. Previously pixels and coefficient were copied as
well. Changed to only copy prediction info.
2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen 89a930d6dd Add part mode bitcost when using SMP/AMP blocks 2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen fc43643ba5 Use a transform split for SMP and AMP blocks 2018-01-18 10:36:25 +02:00