- Handle input processing in a separate thread to allow main thread more time with thread handling etc
- Significant speedup can be seen when run on ultrafast settings and on a system with great number of cores
Option -mavx2 was omitted when compiling AVX2 strategies. This commit
moves strategies to convenience libraries so that their compilation
flags can be easily set and adds -mavx2 to CFLAGS of the AVX2 library.
We have soname versioning now, so we should focus on getting that right
instead. This also serves as an example of correctly incrementing the
lib-version.
Now that we put the timing info into the bitstream, the time base must
be precisely known. Represent framerate as a fraction and add timing
info only if the old floating point framerate was not used.
Deprecate cfg->framerate so it can be removed once we get patches to
FFmpeg and libav.
Add support for (num)/(denom) format to --input-fps.
Also moves CLI stuff under CLI project, so they are compiled as their
own lib just like when the Makefile is used.
The file interface_main.c was an artifact from a bygone era and should have
been deleted long ago.
Add module information to all header files.
Update all header file documentations to briefly say what they are, and
to use the javadoc format so the brief actually gets included into the
doxygen documentation.
Remove \file from implementation files, in order to not repeat the info
from the header files.
Add files under strategies and tools to Doxygen and update the Doxygen
settings to be just plain better.
Make README be the main page of Doxygen documentation.
Bits were being added to rate distortion without being multiplied by
lambda in a few places. Fixing this bug also finally allows us to remove
the magic bits from the Coding Unit split decision.
I tried to find new optimum value for CU_COST and it turned out to be 2
for veryslow and 0 for superfast. The difference between 0 and 2 on
veryslow was only 0.1% however, so I don't think this parameter is
needed any longer. Before this fix the effect of removing CU_COST would
have been 0.8%.
- Updates function search_mv_full so that it compiles and handles
non-square blocks.
- Enables compilation of search_mv_full.
- Sets full search radius to 32.
- Enables selecting full mv search with "--me full".
Changes search_frac and kvz_search_cu_iter to use kvz_satd_any_size for
computing the SATDs instead of getting the SATD function with
kvz_pixels_get_satd_func.
Adds strategy satd_any_size for generic and AVX2. The satd_any_size
functions are implemented with macro SATD_ANY_SIZE defined in
strategies-picture.h.
- Adds function is_pu_boundary.
- Moves code for filtering an edge of a single PU or TU to a new
function filter_deblock_unit.
- Replaces recursive CU tree traversal in filter_deblock_cu with
a simple loop and renames it to filter_deblock_lcu_inside.
Adds parameter block_height to functions inter_recon_frac_luma,
inter_recon_14bit_frac_luma and inter_recon_14bit_frac_chroma so that
they can handle SMP blocks.
Makes the following functions static since they are not used outside
inter.c:
- kvz_inter_recon_frac_luma
- kvz_inter_recon_14bit_frac_luma
- kvz_inter_recon_frac_chroma
- kvz_inter_recon_14bit_frac_chroma
The buffers allocated in functions kvz_get_extended_block_avx2 and
kvz_get_extended_block_generic were too small when the width of the
block was less than its height. Fixed to allocate correctly sized
buffers.
Adds parameter height to functions kvz_inter_recon_lcu and
kvz_inter_recon_lcu_bipred and makes them work on non-square sizes.
Fractional reconstruction functions do not handle non-square blocks yet.
- Moves SIZE_* definitions to cu.h.
- Adds constant arrays kvz_part_mode_num_parts, kvz_part_mode_offsets
and kvz_part_mode_sizes for storing the number of PUs, PU offsets and
PU sizes.
- Adds macros PU_GET_X, PU_GET_Y, PU_GET_W and PU_GET_H for getting the
location and size of a PU.
Moves checks for motion vector prediction and merge candidate block
types (inter/intra) from functions kvz_inter_get_mv_cand and
kvz_inter_get_merge_cand to kvz_inter_get_spatial_merge_candidates.
Remove the need to count the coefficients by populating the significant
coefficient group map first and finding the last coefficient from the
last group afterward. The speedup is about 2% on ultrafast.
The previous version of this patch was reverted due to a bug, which
has now been fixed.
This reverts commit 25462124f8.
That commit broke the bitstream. If it's not good enough to push on Friday
night, it's probably not good enough on Monday morning either.
Remove the need to count the coefficients by populating the significant
coefficient group map first and finding the last coefficient from the
last group afterward.
Increases the MV safety margin of OWF from 2 to 3 when deblocking
is used and 4 when both deblocking and FME are used.
Fractional pixel motion estimation can move the vector one more pixel
down causing checksum error. This fixes that error by increasing the
OWF safety margin and changes the interface, so that different margin
can be used when FME or deblocking are not in use.
Replaces repetitive calls to kvz_filter_deblock_luma and
kvz_filter_deblock_chroma with loops in functions
filter_deblock_edge_luma and filter_deblock_edge_chroma.
Marks the following functions static and removes them from filter.h
since they are not used outside the filter module.
- kvz_filter_deblock_luma
- kvz_filter_deblock_chroma
- kvz_filter_deblock_edge_luma
- kvz_filter_deblock_edge_chroma
- kvz_filter_deblock_cu
- Replace parameter depth in kvz_filter_deblock_edge_{luma,chroma} with
length.
- Move checking whether an edge needs to be filtered from functions
kvz_filter_deblock_edge_{luma,chroma} to functions
kvz_filter_deblock_{cu,lcu}.
- Use pixel coordinates instead of 8-pixel block coordinates in
kvz_filter_deblock_cu.
- Add comments.
These changes should make it easier to modify the deblocking filter to
handle SMP and AMP blocks.