Commit graph

153 commits

Author SHA1 Message Date
Arttu Ylä-Outinen 631ef53d2a Fix inter cost calculations
Inter costs are computed using SAD except when fractional motion
estimation or bi-prediction is enabled. This commit changes
search_pu_inter_ref to recalculate the cost with SATD. Fixes inter/intra
cost comparisons since intra costs are always SATD costs.
2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen 6ce2fb1238 Add pixel offsets to encoder_state_config_tile_t
Adds fields offset_x and offset_y to encoder_state_config_tile_t.
2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen d1e64ad62b Fix undefined left shifts
Replaces left shifts by multiplications when the operand may be
a negative value. Left shift of a negative value is undefined behavior.
2017-07-20 11:15:30 +03:00
Arttu Ylä-Outinen 0850b17f96 Drop get_wpp_limit in search_inter
WPP limit for motion vectors is now computed inside fracmv_within_tile.
2017-07-05 13:22:53 +03:00
Arttu Ylä-Outinen 2a85f0f5a4 Move hard-coded MV limits to encoder_control_t
Adds field max_inter_ref_lcu to encoder_control_t. It is used to set up
inter-LCU dependencies in encoder_state_encode_leaf and restrict motion
vectors in fracmv_within_tile.
2017-07-05 13:22:53 +03:00
Arttu Ylä-Outinen bb5354f7e2 Relax inter-CTU dependencies when SAO is off
When using WPP and OWF, the first CTU of a row depends on the last CTU
of the row below in the reference frame. This is necessary when SAO is
enabled since we currently do SAO for a whole CTU row at a time. When
SAO is disabled, however, it is unnecessary to wait for the whole row.

Changes CTUs to depend only on the CTU below in the reference frame
instead of the whole row when WPP and OWF are enabled and SAO disabled.
Gives a significant speedup when running on a machine with many CPU
cores.
2017-07-05 13:21:06 +03:00
Miika Metsoila f8b6234fdb Changes to refence lists to behave more like L0/L1 lists from the specification 2017-06-27 16:05:15 +03:00
Arttu Ylä-Outinen 4b213477f0 Return best MV from inter early terminate
When using --me-early-termination=sensitive, early termination of inter
search used to always return the starting point if no tested motion
vector was good enough to continue the search. This commit changes
early_termination to always return the best motion vector and cost
found.
2017-05-18 09:05:14 +03:00
Marko Viitanen 4270d451e6 Fixed some errors after rebase 2017-02-13 15:19:24 +02:00
Marko Viitanen b4de1878be Fixed TMVP scaling and candidate selection for B-frames 2017-02-13 15:19:23 +02:00
Marko Viitanen 23be633ad7 Added TMVP merge candidate scaling for L0 2017-02-13 15:19:23 +02:00
Arttu Ylä-Outinen 51786eda67 Drop redundant fields in encoder_control_t
Some of the fields in encoder_control_t were simply copies of the
corresponding fields in kvz_config. This commit drops the copied fields
in favor of using the fields in encoder_control_t.cfg directly.
2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen e78a8dfcf5 Copy the kvz_config passed to encoder_open
The kvz_config struct is created by the user but kvazaar keeps a pointer
to it. It is easy to break things by modifying the configuration outside
kvazaar. In addition, kvazaar modifies the struct even though it is has
a const modifier.

This commit changes the field cfg in encoder_control_t to be a copy of
the kvz_config struct instead of a pointer, removing modifications to
the const struct and allowing users to do whatever they want with it
after opening the encoder.
2017-02-09 13:23:54 +09:00
Arttu Ylä-Outinen 1e6463c08b Fix inter bipred search
When the number of merge candidates was five, biprediction search would
read past the bounds of the priority list arrays. Fixed to limit the
search to the first four candidates.
2017-01-31 18:23:12 +09:00
Arttu Ylä-Outinen 46c9a483c3 Fix inter search for small SMP and AMP blocks
The function search_pu_inter_ref incorrectly rounded the coordinates of
the block to down to a multiple 8 pixels. Small SMP and AMP blocks may
start at coordinates that are not multiples of 8. Fixed by removing the
rounding.

Fixes a failing assert when --mv-constraint is used with --smp or --amp.
2017-01-29 13:34:50 +09:00
Ari Koivula 937a764987 Fix bug in --mv-constraint
Subpixel motion estimation return 0-vector when no subpixel vector is
within the constraint. Fix is to not call subpixel motion estimation
when the integer vector is not within the constraint.
2017-01-26 09:55:57 +02:00
Ari Koivula a85390d0ac Clean up code using the fixed point frac bit tables
This is to prepare for changing the code using the floating point table
to use the fixed point table instead.

This also allows reducing the size of the fractional part, which was
useful for finding every place where the the fixed point presentation
is relied upon.
2017-01-19 20:20:51 +02:00
Arttu Ylä-Outinen 640ff94ecd Use separate lambda and QP for each LCU
Adds fields lambda, lambda_sqrt and qp to encoder_state_t. Drops field
cur_lambda_cost_sqrt from encoder_state_config_frame_t and renames
cur_lambda_cost to lambda.
2017-01-09 01:24:23 +09:00
Ari Koivula 2c005cda25 Fix bug with sub-pixel motion estimation in tiles
The width of the tile was being used to index the frame pixel buffer
instead of the width of the buffer.
2016-11-07 15:53:52 +02:00
Ari Koivula d0512d25c6 Use fixed point in get_mvd_coding_cost 2016-08-30 21:37:12 +03:00
Ari Koivula ec7507a935 Further optimize get_ep_ex_golomb_bitcost
Unrolled 16-bit log2 calculation.
2016-08-30 21:37:01 +03:00
Ari Koivula a4ba794587 Optimize get_ep_ex_golomb_bitcost
Arrange the decision tree such that there is only 3 branches on the
most common paths and the more likely branch is always fall-through.

A profile guided optimization pass would probably do something similar.
2016-08-30 05:24:16 +03:00
Ari Koivula 82cfab58f8 Improve fast mvd coding cost estimation
A lot of time is being taken up by this function on ultrafast, and it
doesn't do a very good job. This change aims to both simplify the
logic and make the estimate better.

The logic is simplified by using a look up for the step mvd bit cost
step function instead of mimicking the binarization process. The
estimation is made better by checking fractional cabac bit costs.

The new function returns the same results as
kvz_get_mvd_coding_cost_cabac, but is also faster than the old
function.
2016-08-30 04:55:09 +03:00
Ari Koivula d31be8eb27 Make mvd_coding_cost functions take const cabac 2016-08-30 04:46:46 +03:00
Jovasa 68eef660bd Fixed search around mv_in in fullsearch not being saved. 2016-08-19 15:19:29 +03:00
Marko Viitanen 83cf801664 Fixed MV constraint condition in bipred 2016-08-18 08:53:17 +03:00
Arttu Ylä-Outinen 2a946bd88e Rename encoder_state_t.global to frame
"Frame" is more accurate than "global" since when OWF is used, encoder
states for each frame have their own struct.
2016-08-10 13:22:36 +09:00
Arttu Ylä-Outinen 5fbb0a8c27 Fix includes 2016-08-10 13:05:40 +09:00
Arttu Ylä-Outinen 22cc97ffb1 Fix missing field initializers. 2016-08-03 14:25:08 +09:00
Ari Lemmetti 7f71cb423a Check 4 fractional pixel positions simultaneously 2016-07-14 12:52:24 +03:00
Ari Lemmetti ad445ab8a1 Transition to kvz_filter_frac_blocks_luma 2016-07-14 12:51:02 +03:00
Ari Lemmetti e9c3074d32 Add buffers and definitions for upcoming filtering
Samples are to be filtered in separate blocks instead of
making one big picture with interpolated pixels
2016-07-14 12:51:02 +03:00
Ari Lemmetti 7afe7e963b Use fme_level to control the search accuracy. 2016-07-14 12:51:01 +03:00
Ari Lemmetti 5fa323bf25 Skip searching best hpel twice. Make hpel and qpel loops similar. 2016-07-14 12:51:01 +03:00
Ari Lemmetti bc98a9affa Change the search order to suit lighter fme search 2016-07-14 12:51:01 +03:00
Arttu Ylä-Outinen 433e528af7 Drop unused variable in search_pu_inter.
Removes unused variable max_px_below_lcu.
2016-06-22 13:35:16 +09:00
Arttu Ylä-Outinen 097bf8f3c0 Add a typedef for mvd coding cost functions. 2016-06-20 13:56:10 +09:00
Arttu Ylä-Outinen cad2d496b8 Enable 4x8 and 4x16 partition modes
Enables search for 2NxN and Nx2N partition modes for 8x8 CUs and 2NxnU,
2NxnD, nLx2N and nRx2N partition modes for 16x16 CUs.

Changes the loop for copying reconstructed luma pixels in
kvz_inter_recon_lcu to use 4 byte chunks instead of 8 byte chunks since
it is now possible to have 4 pixel wide blocks.
2016-06-16 20:23:16 +09:00
Arttu Ylä-Outinen 360f5bb8da Always use pixel coordinates for indexing lcu_t.
Removes macro LCU_GET_CU and uses LCU_GET_CU_AT_PX in its place.
2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen 46e8122d27 Add functions for indexing cu_array_t structures.
Replaces macro CU_ARRAY_AT with functions kvz_cu_array_at and
kvz_cu_array_at_const.
2016-06-16 18:52:19 +09:00
Arttu Ylä-Outinen b276a347c0 Add a macro for indexing cu_array_t.
Adds macro CU_ARRAY_AT(cu_array, x, y) to cu.h.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen 41e75daed7 Fix overlapping memcpy in kvz_search_cu_smp.
The destination and source pointers might be equal. Fixed by replacing
the memcpy call with a simple assignment.
2016-06-15 12:25:11 +09:00
Ari Lemmetti 29af8bcd21 Remove const to match function signature 2016-06-14 18:19:40 +03:00
Eemeli Kallio 5af6ab320c Merge branch 'me_early_terminate'
Conflicts:
	configure.ac
	src/cfg.c
	src/cli.c
	src/kvazaar.h
	src/search_inter.c
2016-06-14 15:03:35 +03:00
Arttu Ylä-Outinen 23fdeeaf10 Move mv_cand and mv_dir into a bitfield in cu_info_t.
Reduces size of cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen b6d793ef33 Drop field inter.mvd from cu_info_t
Instead of storing the mv differences in cu_info_t, they are computed
from the mv candidates and the motion vector. Reduces the size of
cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen ebb10763f1 Drop field inter.mv_ref_coded from cu_info_t.
Storing inter.mv_ref_coded in cu_info_t is unnecessary since it can be
computed from refmap and inter.mv_ref.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen 30e9ee988d Move bitcost field out of cu_info_t.inter.
The bitcost is only needed for the currently searched CU.

Fixes bitcost of the second PU being ignored when using SMP or AMP.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen 16d13ed046 Move cost field out of cu_info_t.inter
The cost is only needed for the currently searched CU.
2016-06-14 12:20:05 +09:00
Eemeli Kallio e4f1a74512 Added early termination option for motion estimation.
Conflicts:
	src/search_inter.c
2016-06-13 16:20:35 +03:00
Wassim Hamidouche 02308d1ba6 add MVs encryption 2016-06-07 10:28:30 +02:00
Eemeli Kallio 8f182ac6de Added functions select_starting_point and mv_in_merge to search_inter.c 2016-06-06 17:16:04 +03:00
Eemeli Kallio 836a3b1daa Added functions select_starting_point and mv_in_merge. 2016-06-06 12:18:33 +03:00
Ari Koivula f51a68b6fa Add different sizes of search window for full search 2016-04-21 15:11:35 +03:00
Ari Koivula 28e7548387 Fix bug in full mv search
This optimization led to some points not being searched.
2016-04-21 12:03:57 +03:00
Ari Koivula 2576aeee0b Use merge candidates in full mv search
Perform a full search window around every mv candidate and the
0-vector.
2016-04-20 20:47:11 +03:00
Ari Koivula 61fc3e87ba Run include-what-you-use fix_includes.py fix_includes.py
The includes should make more sense now and not just happen to compile
due to headers included from other headers.

Used a modified version of IWYU. Modifications were to attribute int8_t
and so on to stdint.h instead of sys/types.h and immintrin.h instead of
more specific headers.

include-what-you-use 0.7 (git:b70df35)
based on clang version 3.9.0 (trunk 264728)
2016-04-01 17:46:55 +03:00
Ari Koivula e23ed231fb Fix race condition with owf and non-square motion partitions
The OWF wpp limit code assumed square blocks, and as such did not work
correctly when height != width. This changes the relevant code to consider
both height and width.
2016-03-22 16:46:38 +02:00
Arttu Ylä-Outinen d6a3e02f16 Fix calculating reference CU index in inter search
Fixes a possible segfault when SMP or AMP blocks are used.
2016-03-22 12:55:58 +02:00
Ari Koivula 1165ae2e1f Increase --mv-constraint=frametimemargin margin
Increase the margin to be 4 luma pixels to every direction.
2016-03-14 16:02:54 +02:00
Ari Koivula f8edf28161 Fix const qualifier warning
Also set the warning to an error in VS.
2016-03-09 14:16:15 +02:00
Ari Koivula b0c3ece31e Fix race condition when deblocking is on but SAO is off
Already suspected this yesterday, but didn't want to add the code to
handle it before confirming that it's actually a problem. It is.
2016-03-09 14:02:46 +02:00
Ari Koivula 1671725c72 Fix non-determinism issue with OWF WPP margin
The previous reasoning used deblocking and fractional motion estimation
together to arrive at a margin of 4 pixels. This was wrong, and with
either of these off, half pixel chroma interpolation could use pixels
outside the intended region.

Deblocking does not currently affect the margin needed.
2016-03-08 20:18:38 +02:00
Ari Koivula aec152c953 Fix OWF mv restriction limit
The check was done in regard to the wrong dimension, allowing the
access to unfinished parts of the frame when coding multiple frames
at the same time.
2016-03-08 17:12:43 +02:00
Ari Koivula 49ea2d7b7f Fix --mv-constraint=frametile
Option --mv-constraint=frametilemargin was being used instead of
frametile.
2016-03-07 16:41:00 +02:00
Ari Koivula 81b439f4da Optimize starting point selection in tz
Avoid checking zero motion vectors multiple times. The merge candidate
list often has only one or two candidates, the other being zeroes.
2016-03-04 16:48:46 +02:00
Ari Koivula 2436702c27 Optimize starting point selection in hexbs
Avoid checking zero motion vectors multiple times. The merge candidate
list often has only one or two candidates, the other being zeroes.
2016-03-04 16:48:12 +02:00
Ari Koivula 5327b59b45 Remove KVZ_PERF_SEARCHPX
It's too invasive and we don't really need it.
2016-03-04 16:48:12 +02:00
Ari Koivula 86219aa0fc Fix non-determinism with tiles
Earlier fix that fixed the supply side of the cu_array to take tile
coordinates into account should have been accompanied with this one
that does the same thing to demand side.
2016-03-03 17:39:20 +02:00
Ari Koivula 3dcc0957f8 Deal with impossible mv constraints
If 0,0 vector is illegal, it's possible that no legal movement vector,
is found, in which case a large cost is returned instead. The cost
overflowed and there is all sorts of silliness with converting from
double to int, but I'm not going to fix all of it because when we
remove the doubles it will all get fixed.
2016-02-29 19:18:14 +02:00
Ari Koivula b1adf1576a Add --mv-constraint=frametilemargin
Add an even stricter motion vector constraint to prevent motion vectors
to fractional pixel positions that would need pixels outside the tile.
2016-02-29 19:18:14 +02:00
Ari Koivula f4ebff12b0 Combine tile mv constraint with OWF mv constraint
This also fixes movement vectors in tiles when OWF is on. The OWF mv
constraint assumed WPP, so it didn't work with tiles.
2016-02-29 14:33:06 +02:00
Ari Koivula 7981609cd0 Add --mv-constraint=frametile 2016-02-29 14:33:06 +02:00
Ari Koivula 947bae24f9 Update Doxygen documentation
Add module information to all header files.

Update all header file documentations to briefly say what they are, and
to use the javadoc format so the brief actually gets included into the
doxygen documentation.

Remove \file from implementation files, in order to not repeat the info
from the header files.

Add files under strategies and tools to Doxygen and update the Doxygen
settings to be just plain better.

Make README be the main page of Doxygen documentation.
2015-12-17 14:05:50 +02:00
Arttu Ylä-Outinen 0e33049d9e Enable full mv search once again.
- Updates function search_mv_full so that it compiles and handles
  non-square blocks.
- Enables compilation of search_mv_full.
- Sets full search radius to 32.
- Enables selecting full mv search with "--me full".
2015-12-15 12:26:26 +02:00
Arttu Ylä-Outinen 38b881c36f Implement search_frac for rectangular blocks.
Replaces parameter depth of function search_frac with parameters width
and height.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 864c77f6eb Use kvz_satd_any_size in inter search.
Changes search_frac and kvz_search_cu_iter to use kvz_satd_any_size for
computing the SATDs instead of getting the SATD function with
kvz_pixels_get_satd_func.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 1eebfde0c5 Make tz search work with non-square blocks.
Replaces parameter depth with parameters width and height.
2015-12-15 11:21:44 +02:00
Arttu Ylä-Outinen bdd8b1c0aa Implement 2NxN partitions in inter search.
- Try using 2NxN partitions after the usual 2Nx2N.
- Adds function kvz_search_cu_smp to search_inter module.
2015-12-15 11:21:42 +02:00
Arttu Ylä-Outinen 3236428e4d Make hexbs search work with non-square blocks.
Replaces parameter depth with parameters width and height.
2015-12-15 11:21:42 +02:00
Arttu Ylä-Outinen dc4525c0e3 Implement inter recon for non-square blocks.
Adds parameter height to functions kvz_inter_recon_lcu and
kvz_inter_recon_lcu_bipred and makes them work on non-square sizes.
Fractional reconstruction functions do not handle non-square blocks yet.
2015-12-15 11:21:41 +02:00
Arttu Ylä-Outinen a3df13fb99 Make kvz_inter_get_merge_cand work with SMP blocks.
- Replaces parameter depth with parameters width and height.
- Adds parameters use_a1 and use_b1 for disabling the use of merge
  candidates A1 and B1.
2015-12-15 11:21:40 +02:00
Arttu Ylä-Outinen 02375bf7e5 Make kvz_inter_get_mv_cand work with SMP blocks.
Replaces the depth parameter of kvz_inter_get_mv_cand with parameters
width and height.
2015-12-15 11:21:39 +02:00
Ari Koivula c94707e6e8 Fix bug with OWF+FME+deblocking
Increases the MV safety margin of OWF from 2 to 3 when deblocking
is used and 4 when both deblocking and FME are used.

Fractional pixel motion estimation can move the vector one more pixel
down causing checksum error. This fixes that error by increasing the
OWF safety margin and changes the interface, so that different margin
can be used when FME or deblocking are not in use.
2015-12-04 15:26:56 +02:00
Arttu Ylä-Outinen 21e19067fe Extract inter search in a single ref frame.
Moves code for doing inter search in a single reference frame from
function kvz_search_cu_inter to a new function search_cu_inter_ref.
2015-11-18 11:16:27 +02:00
Arttu Ylä-Outinen 8db8f3d523 Use macro SUB_SCU where possible.
Replaces expressions like (x & 0x3f) with SUB_SCU(x).
2015-11-18 11:16:26 +02:00
Arttu Ylä-Outinen 9532d79adb Add macros for indexing cu array in lcu_t.
- Adds macros LCU_GET_CU and LCU_GET_CU_AT_PX to cu.h.
- Replaces accesses to the cu array of lcu_t by calls to these macros.
2015-11-18 11:16:26 +02:00
Marko Viitanen 0cb57961b0 Use dynamically selected get_mvd_cost function for MV candidate selection 2015-11-05 14:31:37 +02:00
Marko Viitanen 4e7e9eefbf Enable usage of MV RDO with a config parameter (in hexbs, tz, frac, bipred) 2015-11-05 12:24:03 +02:00
Marko Viitanen 9a535e1c56 Added missing kvz_ prefixes and fixed some warnings 2015-11-05 09:07:59 +02:00
Marko Viitanen 1ed0d85020 Added a function for cabac mvd coding cost get_mvd_coding_cost_cabac()
Conflicts:
	src/rdo.c
2015-11-05 09:07:59 +02:00
Marko Viitanen 6a2658cc74 Added calc_mvd_cost_cabac() to calculate real bits used for motion vectors
Conflicts:
	src/rdo.h
	src/search_inter.c

Conflicts:
	src/rdo.c
2015-11-05 09:07:59 +02:00
Arttu Ylä-Outinen 173b70b53f Rename SLICE_* enum constants to KVZ_SLICE_*. 2015-09-28 10:30:56 +03:00
Ari Koivula ec2d8d6ad7 Rename _DEBUG_PERF macros to KVZ_PERF
And move them to threadqueue.h, where the things that use them are.
2015-09-15 13:03:32 +03:00
Arttu Ylä-Outinen 3a10e9e3e0 Prefix all non-static symbols with "kvz_". 2015-08-26 13:02:28 +03:00
Ari Lemmetti 6a5eaf08de Rename extend_borders to get_extended_block. Add kvz_ prefix to type definition. 2015-08-17 15:01:46 +03:00
Ari Lemmetti d82582c37c Changes to extend border function.
Now outputs a pointer to a block with guaranteed padding for filtering.
Only generate extra pixels if samples are needed out of bounds.
Use memcpy otherwise.
2015-08-17 15:01:46 +03:00
Ari Lemmetti 7cd4f7a5c9 Enable fractional motion vectors with bipred 2015-08-10 18:49:12 +03:00
Ari Lemmetti 650dd7d840 Use pixels_blit to copy neccessary pixels. 2015-08-10 17:52:00 +03:00
Ari Koivula 022d28ab11 Fix small hexbs pattern.
- Who could mess this up? Oh.. right.
2015-07-21 16:12:44 +03:00