DST function was returned for inter luma transform blocks of size 4x4
even though they must use DCT. Fixed by checking the prediction mode of
the block in addition to whether it is chroma or luma.
When 4x4 intra blocks are enabled and inter search is limited to 16x16
and larger blocks, it is possible that inter search is accidentally done
for 4x4 blocks. Fixed by checking that block size is at least 8x8 before
doing inter search.
Overrides the linkers used for kvazaar, libkvazaar.la and kvazaar_tests.
When crypto++ is enabled, the C++ linker is used and when it is
disabled, the C linker is used.
This removes the need to explicitly specify -lstdc++ in configure when
crypto++ is used and fixes the build with crypto++ when libstd++ is not
installed.
- Fixes two errors in calculating the POC for the reference frame for
temporal candidate MV scaling.
- Fixes using the MV for the wrong direction when the temporal MV
predictor block uses bi-prediction.
Fixes#160.
Changes handling of intra pictures for --gop=8 so that every picture
with POC divisible by the intra period is intra. The first picture is
IDR and the rest of the intra pictures are CRA. POC is not reset at CRA
pictures. The leading pictures that follow the CRA picture are changed
to RASL so they are allowed to refer to pictures before the CRA picture.
Changes inter slice types to P when the L1 reference list is empty and
to B otherwise.
In all-intra, all pictures are now IDR pictures with POC zero.
When using --gop=8 with an intra period greater than one, a single POC
would be skipped before every intra frame. This commit fixes the problem
by turning the intra frames into BLA frames with leading pictures when
using --gop=8.
Changes thread queue so that only the jobs that are ready to run are
stored in the queue. Other jobs are kept track of by pointers in the
reverse dependency lists of other jobs. When a job is ready to run it is
appended to the queue. The job queue is stored as a linked list.
The definitions of threadqueue_queue_t and threadqueue_job_t are moved
to the .c file, turning them into opaque structs.
Makes thread queue code simpler. Fixes some TSan errors.
Adds struct inter_search_info_t for holding the parameters that are used
by most function related to inter search. Passing the parameters in
a single struct greatly reduces the number of parameters for many
functions.
Functions kvz_sao_reconstruct and encoder_sao_reconstruct used
frame->width as the stride instead of frame->rec->stride when accessing
frame->rec->data. This caused errors when using tiles and SAO.
Changes the work_tree parameter in search.c functions from an array to
a pointer. Fixes "formal parameter with requested alignment of 8 won't
be aligned" errors.
Changes field state->tile->frame->cu_array->data to point to the CU
array in the main encoder state. Removes the need to copy the CU array
to the main CU array after search.
Inter costs are computed using SAD except when fractional motion
estimation or bi-prediction is enabled. This commit changes
search_pu_inter_ref to recalculate the cost with SATD. Fixes inter/intra
cost comparisons since intra costs are always SATD costs.
Changes function kvz_get_coeff_cost to only copy the CABAC contexts and
not the whole encoder state.
Other threads could be simultaneously using the other parts of the
encoder state. Only copying the CABAC fixes a TSan data race warning.
Adds alignment attribute to lcu_coeff_t. The coefficients are sometimes
handled as 64-bit integers containing four coefficients so the arrays
should be aligned to 8 bytes.
Fixes a UBSan error about misaligned reads.
Changes OWF selection so that it is chosen based on the maximum number
of parallel CTUs. Number of threads is limited to prevent overhead from
extra threads.
Drop pthread_cond_broadcast on threadqueue->cond in function
kvz_threadqueue_waitfor. The broadcast caused threads to be woken up
more often than necessary.
Changes encoder_state_init_new_frame to only call normalize_lcu_weights
when the weights have been written to the array and rate control is
enabled. When rate control is disabled, the weights are not used.
Adds field max_inter_ref_lcu to encoder_control_t. It is used to set up
inter-LCU dependencies in encoder_state_encode_leaf and restrict motion
vectors in fracmv_within_tile.
When using WPP and OWF, the first CTU of a row depends on the last CTU
of the row below in the reference frame. This is necessary when SAO is
enabled since we currently do SAO for a whole CTU row at a time. When
SAO is disabled, however, it is unnecessary to wait for the whole row.
Changes CTUs to depend only on the CTU below in the reference frame
instead of the whole row when WPP and OWF are enabled and SAO disabled.
Gives a significant speedup when running on a machine with many CPU
cores.
Moves SAO reconstruction into encoder_state_worker_encode_lcu instead of
doing it in a separate step for the whole CTU row. Reconstruction of the
rightmost 10 pixels and bottommost 10 pixels of a CTU is delayed until
the neighboring CTU has been deblocked.
Doing SAO for the whole CTU row at a time caused unnecessary inter-CTU
dependencies when using WPP and OWF. The first CTU of a row would need
to wait until SAO was done for the row below in the previous frame.
Moving SAO reconstruction to immediately after deblocking each CTU fixes
this problem.
Adds width and height parameters to function kvz_sao_reconstruct and
changes it to take coordinates in units of pixels. This will be useful
for doing SAO for areas smaller than a whole CTU.
AVX2 filter functions read pixels in chunks of 8 or 16 bytes. At the end
of the block, the read goes out of the bounds of the pixels array. The
extra pixels do not affect the result.
Fixes valgrind complaining about the invalid reads by allocating 5 extra
pixels in kvz_get_extended_block_avx2
With low delay GOP structure, it is possible to use an intra period that
is not a multiple of the GOP structure length. Commit 00c9f52 changed
encoder_state_init_new_frame to reset POC on intra frames. GOP offset,
however, was not reset, resulting in invalid POCs and references for the
following frames.
This commit changes function kvz_encoder_feed_frame so that GOP offset
is correctly reset on intra frames.
When closing the encoder, the pictures stored in the input frame buffer
are freed by repeatedly calling kvz_encoder_feed_frame. If the encoder
was closed immediately after opening it, kvz_encoder_feed_frame would be
called with an unprepared encoder state. This would trigger an assert.
Fixed by changing kvz_encoder_feed_frame so that it does not require the
encoder state to be prepared.
Sets max_transform_hierarchy_depth_inter to 0 in SPS. This saves some
bits because split_transform_flag does not need to be coded for inter
blocks.
When SMP and AMP blocks are enabled the depth is set to 1 instead.
Otherwise inter split flag would default to 1 for SMP and AMP blocks,
resulting in an unnecessary transform split.
Changes kvazaar_close to stop all threads before freeing encoder states.
Fixes a crash when the encoder is closed before all pictures have been
encoded.