Commit graph

2019 commits

Author SHA1 Message Date
Ari Koivula b8a618e666 Fix problems with >8 bit input
Enforce bit depth promised by --input-bitdepth to avoid crashes when
larger values are provided.

Do endianess byte swap for all bytes when the buffer gets extended
to multiple of 8 pixels, and not just the number of input pixels.

Don't swap bytes on a little-endian system.
2016-11-13 19:58:54 +02:00
Ari Koivula 2c005cda25 Fix bug with sub-pixel motion estimation in tiles
The width of the tile was being used to index the frame pixel buffer
instead of the width of the buffer.
2016-11-07 15:53:52 +02:00
Ari Koivula 78a28e0338 Reformat --help message
- Reduce indentation to 6 spaces
- Word wrap everything to under 80 characters
- Remove defaults from options covered by presets
- Add a dash in front of argument descriptions
- Add --(no-) to names of parameters that accept it and remove mention
  of enabling or disabling
- Add executable and scripts as a dependancy to make docs
2016-11-04 15:40:28 +02:00
Ari Koivula d18de19d8a Fix DTS and PTS not being passed on through lib API
Fixes "cur_dts is invalid" warning from FFmpeg.
2016-10-28 19:05:47 +03:00
Ari Koivula 0c41c2ebd6 Make CLI set PTS for each input picture
This value is not represented in the HEVC bitstream, which is why it
was not set previously. FFmpeg sets and needs it however, so make the
CLI set it as well to make sure we handle it correctly.
2016-10-28 19:03:03 +03:00
Ari Koivula 5bf745460d Re-categorize options in the help message
- Move VUI stuff to the bottom
- Merge Parallel processing, WPP, Tiles and slices
- Add more categories for the other options
2016-10-27 03:26:15 +03:00
Ari Koivula cb6672b452 Disable WPP when Tiles are enabled
Closes #142.
2016-10-27 02:07:10 +03:00
darealshinji 488d042e5f Bump KVZ_VERSION 2016-10-25 12:32:13 +02:00
Ari Lemmetti 29153ed503 Remove unused variable 2016-10-21 17:28:42 +03:00
Ari Lemmetti 778e46dfd8 Add AVX2 version of SSD 2016-10-21 15:07:53 +03:00
Ari Lemmetti 6f5d7c9e06 Move SSD to strategies 2016-10-21 15:07:23 +03:00
Ari Lemmetti 89b941eab4 Fix typo 2016-10-21 15:07:02 +03:00
Alexis Ballier 1dcc993743 Include i386 & i486 for compiling intel asm.
x86_64-pc-linux-gnu-gcc -m32 that I use for building 32bits libraries on amd64 defines only __i386__.
2016-10-14 18:07:37 +02:00
Arttu Ylä-Outinen 5fb7afe8c4 Add --implicit-rdpcm command line parameter.
Makes it possible to use lossless coding without implicit residual DPCM.
2016-10-03 20:01:55 +09:00
Arttu Ylä-Outinen 5affc0f527 Use implicit RDPCM in lossless mode.
Sets implicit RDPCM flag in SPS when lossy coding is disabled and
applies DPCM to intra residual when prediction mode is horizontal or
vertical.
2016-10-03 19:31:38 +09:00
Ari Koivula 016dbe0894 Further refine presets
The rd-complexity of slow presets is better with a less agressive GOP.

Adding the GOP as part of the preset improved BDRate enough, that it
didn't make sense anymore to have a veryslow target the best BDRate.
Instead, push that responsibility to placebo by making it a little bit
faster.
2016-09-29 17:35:12 +03:00
Ari Koivula 31c5ff0f16 Add cross-platform core number detection
Well, turns out pthread_num_processors_np isn't standard so we need to
do this crap. Threw in hyper threading detection as a bonus.
2016-09-29 00:03:21 +03:00
Ari Koivula 8c7351eac8 Fix lp-gop with depth 1
GOPs with depth 1 had the same structure as those with depth 2:
g4d3t1 = 3 2 3 1
g4d2t1 = 2 2 2 1
g4d1t1 = 2 2 2 1

It now results in the correct:
g4d1t1 = 1 1 1 1
2016-09-29 00:03:21 +03:00
Ari Koivula a395aeaac9 Set default settings to those of --preset=medium 2016-09-29 00:03:21 +03:00
Ari Koivula 4388fe0d30 Set presets to ratedistortion-complexity optimized versions 2016-09-29 00:03:20 +03:00
Ari Koivula facb1e16df Use -p64 -q22 and --gop=lp-g4d3t1 by default
Coding inter without GOP of any kind really isn't a very sensible
default. Defaulting to B-GOP of some kind would be more better,
but lp-gop is more robust for now.
2016-09-29 00:03:20 +03:00
Ari Koivula d7391a9593 Improve default for number of parallel frames 2016-09-29 00:03:20 +03:00
Ari Koivula 19d423ab29 Use all available cores by default 2016-09-29 00:03:20 +03:00
Ari Koivula 3f138f087a Allow non-gop-length --period for lp-gop 2016-09-29 00:03:19 +03:00
Ari Koivula 16790c9f15 Remove number of references from --gop=lp syntax
The number of references should be part of the presets, so gop should
be defined separately.
2016-09-29 00:03:19 +03:00
Ari Koivula cbfa824d1a Merge branch 'simd' 2016-09-27 20:49:45 +03:00
Ari Koivula 14a7bcba25 Use a faster function for clipped inter SAD
Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes
for which we don't have AVX versions yet.

Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for:
--preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp

* Suite speed_tests:
-PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
-PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
2016-09-27 20:48:30 +03:00
Arttu Ylä-Outinen 4313e56c2d Add --no-rdoq-skip command line switch 2016-09-11 17:40:16 +09:00
Ari Koivula a7a33b08ec Remove --slice-addresses from usage message
And give a warning if it's used.

Slices will have to be implemented at some point, but they aren't yet
so let's not advertize them.
2016-09-10 21:06:00 +03:00
Eemeli Kallio f41e428e5f Removed kvz_skip_unnecessary_rdoq and reworked --rdoq-skip to skip 4x4 blocks when it is on. 2016-09-09 10:26:07 +03:00
Eemeli Kallio ed9c0b0416 RDOQ reworked in rdo.c. rdoq_signhide now skips coeffs that are after best_last_idx. 2016-09-09 10:16:51 +03:00
Ari Koivula 02cd17b427 Add faster AVX inter SAD for 32x32 and 64x64
Add implementations for these functions that process the image line by
line instead of using the 16x16 function to process block by block.

The 32x32 is around 30% faster, and 64x64 is around 15% faster,
on Haswell.

PASS inter_sad: 28.744M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec)
PASS inter_sad: 7.882M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)
to
PASS inter_sad: 37.828M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec)
PASS inter_sad: 9.081M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)
2016-09-01 21:36:39 +03:00
Ari Koivula d0512d25c6 Use fixed point in get_mvd_coding_cost 2016-08-30 21:37:12 +03:00
Ari Koivula ec7507a935 Further optimize get_ep_ex_golomb_bitcost
Unrolled 16-bit log2 calculation.
2016-08-30 21:37:01 +03:00
Ari Koivula a4ba794587 Optimize get_ep_ex_golomb_bitcost
Arrange the decision tree such that there is only 3 branches on the
most common paths and the more likely branch is always fall-through.

A profile guided optimization pass would probably do something similar.
2016-08-30 05:24:16 +03:00
Ari Koivula 82cfab58f8 Improve fast mvd coding cost estimation
A lot of time is being taken up by this function on ultrafast, and it
doesn't do a very good job. This change aims to both simplify the
logic and make the estimate better.

The logic is simplified by using a look up for the step mvd bit cost
step function instead of mimicking the binarization process. The
estimation is made better by checking fractional cabac bit costs.

The new function returns the same results as
kvz_get_mvd_coding_cost_cabac, but is also faster than the old
function.
2016-08-30 04:55:09 +03:00
Ari Koivula d31be8eb27 Make mvd_coding_cost functions take const cabac 2016-08-30 04:46:46 +03:00
Ari Koivula 64d631c174 Fix 8bit to 10bit input conversion regression 2016-08-25 22:09:40 +03:00
Ari Koivula 27789125d8 Fix input bit depth conversion
The input was being shifted to the wrong direction.
2016-08-25 22:05:25 +03:00
Ari Koivula 4ec039004b Add monochrome encoding
Write bitstream without chroma when encoding with --input-format=P400.
This reduces bitstream size by 0-1 %, compared to coding monochrome in
420 format, and speeds up encoding slightly due to not processing
chroma.
2016-08-25 20:15:26 +03:00
Ari Koivula c5b70cf812 Add chroma format support to yuv_t 2016-08-24 19:20:53 +03:00
Ari Koivula 032ed30ff4 Add chroma format support to kvz_picture
Add picture_alloc_csp to libkvz api to allocated pictures with chroma
format different from 420.
2016-08-24 19:20:53 +03:00
Ari Koivula 48ccc26839 Add --input-format and --input-bitdepth
Adds reading of 10 bit input for 10-bit encoding.
2016-08-24 19:20:53 +03:00
Ari Koivula cc08073615 Refactor some indexing weirdness in init_lcu_t
I thought there might be a bug in this so I cleaned it up.
2016-08-24 19:12:48 +03:00
Ari Koivula b6d674d66e Refactor integer vector inter prediction
This code was pretty bad, so I cleaned it up a bit.
2016-08-24 19:09:26 +03:00
Ari Lemmetti 28c4174d0e Fix incorrect shuffle parameters
_MM_SHUFFLE uses reverse order
2016-08-23 19:40:46 +03:00
Ari Lemmetti ce77bfa15b Replace KVZ_PERMUTE with _MM_SHUFFLE
The same exact macro already exists
2016-08-22 19:08:46 +03:00
Jovasa 68eef660bd Fixed search around mv_in in fullsearch not being saved. 2016-08-19 15:19:29 +03:00
Eemeli Kallio 99d8b9abeb Changed skip_rdoq name to kvz_skip_unnecessary_rdoq. Changed the order it uses when it goes through CGs and tuned its sum calculation. 2016-08-18 14:02:56 +03:00
Eemeli Kallio 1fb4755f31 Added rdoq-skip to quant-generic.c 2016-08-18 12:17:54 +03:00
Eemeli Kallio d20ac03ca2 Added --rdoq-skip option 2016-08-18 12:17:53 +03:00
Marko Viitanen 83cf801664 Fixed MV constraint condition in bipred 2016-08-18 08:53:17 +03:00
Marko Viitanen 5ae1c595f2 Fixed slice_temporal_mvp_enabled_flag and disabled TMVP with tiles
- slice_temporal_mvp_enabled_flag should be signalled also with non-IDR I-slices
2016-08-10 14:51:41 +03:00
Marko Viitanen 5326519182 TMVP cleanup and const qualifier fixes 2016-08-10 14:10:43 +03:00
Marko Viitanen f40907260d Added config parameter for TMVP and cmdline option --no-tmvp
- Enabled by default
 - Cannot be used with GOP at the moment
2016-08-10 14:09:29 +03:00
Marko Viitanen fd52dac1f7 Fixed TMVP scaling 2016-08-10 14:09:28 +03:00
Marko Viitanen c664bc8cf7 Added flag collocated_ref_idx to the slice header 2016-08-10 14:09:28 +03:00
Marko Viitanen c5f2611a38 Fixes for TMVP to work with the new CU array 2016-08-10 14:09:28 +03:00
Marko Viitanen d85af5755b TMVP working when only 1 ref frame 2016-08-10 14:09:28 +03:00
Marko Viitanen 39f0165efe Fix a bug in TMVP, the reference cu_array was being overwritten 2016-08-10 14:09:27 +03:00
Marko Viitanen adab8c327e Clean TMVP code 2016-08-10 14:09:20 +03:00
Marko Viitanen 5fa8226ac9 Temporal merge candidate selection 2016-08-10 14:09:20 +03:00
Marko Viitanen f83042f4a1 Temporal MV candidate selection 2016-08-10 14:09:19 +03:00
Marko Viitanen f8671581e3 Implemented function kvz_inter_get_temporal_merge_candidates() 2016-08-10 14:09:19 +03:00
Marko Viitanen 2956bdb379 Added flag slice_temporal_mvp_enabled_flag 2016-08-10 14:09:19 +03:00
Arttu Ylä-Outinen 2a946bd88e Rename encoder_state_t.global to frame
"Frame" is more accurate than "global" since when OWF is used, encoder
states for each frame have their own struct.
2016-08-10 13:22:36 +09:00
Arttu Ylä-Outinen 5fbb0a8c27 Fix includes 2016-08-10 13:05:40 +09:00
Arttu Ylä-Outinen aabf6ca3ee Extract encoding code from encoderstate.c
Moves functions kvz_encode_coding_tree and kvz_encode_coeff_nxn from
encoderstate.c to encode_coding_tree.c.
2016-08-09 22:16:50 +09:00
Arttu Ylä-Outinen 803f29be8f Remove reconstructed picture allocation in lossless.
Changes encoder_set_source_picture to set the reconstructed picture to
a copy of the source picture instead of allocating a new picture when
lossless coding is used.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen aaec473a19 Refactor encoder state initialization.
- Moves allocation of the reconstructed picture after the source picture
  is set.
- Extracts main state initialization to a separate function from
  encoder_state_new_frame.
- Changes kvz_encoder_feed_frame to return the frame.
- Renames some functions to better match their purpose.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen cd7024b3a5 Skip computing SSD when using lossless coding.
The SSD is always zero since it is lossless.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen fbbe5d1844 Use kvz_pixels_calc_ssd for SSD in search.c.
Replaces loops for computing SSDs by calling kvz_pixels_calc_ssd in
search.c.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 22cc97ffb1 Fix missing field initializers. 2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 06b82bf888 Disable filters, trskip and signhide in lossless.
When lossless coding is used, deblock and SAO are skipped, transform
skip flag is not written and sign hiding is not used.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 97451ec401 Align assignments in encoder.c. 2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 1dc94663c3 Bypass transform and quantization with --lossless.
When --lossless is given, set cu_transquant_bypass_flag for every CU and
bypass transform and quantization by directly copying reference pixels
to reconstruction and the residual to coefficients.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 2113b0182d Enable PPS-level tq bypass flag with --lossless.
Sets transquant_bypass_enable_flag to true in PPS when --lossless is
given.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen a5897bbece Make cabac context initialization tables static. 2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 23e7d9bb37 Add --lossless command line parameter. 2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 5372ea432f Update README and manpage. 2016-08-03 14:25:08 +09:00
Ari Lemmetti 6bcba004ff Comment out to fix unused code error on clang. 2016-07-14 14:12:16 +03:00
Ari Lemmetti c0979ebdcb Implement AVX2 luma sampling 2016-07-14 12:53:02 +03:00
Ari Lemmetti 6244560426 Add avx2 strategy for kvz_filter_frac_blocks_luma. 2016-07-14 12:53:02 +03:00
Ari Lemmetti 9c4e9e049b Load only what is needed. Eliminate latency from hadds. 2016-07-14 12:53:01 +03:00
Ari Lemmetti 7f71cb423a Check 4 fractional pixel positions simultaneously 2016-07-14 12:52:24 +03:00
Ari Lemmetti ad445ab8a1 Transition to kvz_filter_frac_blocks_luma 2016-07-14 12:51:02 +03:00
Ari Lemmetti fccfbd2f28 Add strategy for kvz_filter_frac_blocks_luma 2016-07-14 12:51:02 +03:00
Ari Lemmetti e9c3074d32 Add buffers and definitions for upcoming filtering
Samples are to be filtered in separate blocks instead of
making one big picture with interpolated pixels
2016-07-14 12:51:02 +03:00
Ari Lemmetti 7afe7e963b Use fme_level to control the search accuracy. 2016-07-14 12:51:01 +03:00
Ari Lemmetti 5fa323bf25 Skip searching best hpel twice. Make hpel and qpel loops similar. 2016-07-14 12:51:01 +03:00
Ari Lemmetti bc98a9affa Change the search order to suit lighter fme search 2016-07-14 12:51:01 +03:00
Ari Lemmetti 2b0c8db349 Add quad satd for avx2 2016-07-14 12:50:24 +03:00
Ari Lemmetti 0ff69fd6f8 Add any size multi satd 2016-07-14 12:48:37 +03:00
Ari Lemmetti d17b9e7d6e Allow subme parameters 0-4
Update usage, presets,defaults,lib version
2016-07-12 19:49:38 +03:00
Arttu Ylä-Outinen 62ad57d0bf Fix kvz_image_list_add for zero-sized lists.
When a list does not have space for the new element, its size is
doubled. If the size of the list is zero, it would not be resized. Fixed
to always resize the list so that the new element can be added.
2016-06-22 13:35:16 +09:00
Arttu Ylä-Outinen 433e528af7 Drop unused variable in search_pu_inter.
Removes unused variable max_px_below_lcu.
2016-06-22 13:35:16 +09:00
Arttu Ylä-Outinen 7836ff6ec9 Drop unused functions.
Removes functions kvz_coefficients_calc_abs, kvz_intra_rdo_cost_compare
and kvz_rdo_cost_intra which are no longer used.
2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen e4b5840f56 Add parentheses around macro arguments in cabac.h. 2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen a387b74e51 Fix resolution auto-detection.
Only try to guess the resolution from filename when neither width nor
height is given.
2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen 097bf8f3c0 Add a typedef for mvd coding cost functions. 2016-06-20 13:56:10 +09:00