The kvz_config struct is created by the user but kvazaar keeps a pointer
to it. It is easy to break things by modifying the configuration outside
kvazaar. In addition, kvazaar modifies the struct even though it is has
a const modifier.
This commit changes the field cfg in encoder_control_t to be a copy of
the kvz_config struct instead of a pointer, removing modifications to
the const struct and allowing users to do whatever they want with it
after opening the encoder.
This encapsulates tiles or WPP rows into their own slices, making
it possible to send them as soon as they are done, instead of waiting
for the other substreams to finish and coding the substream offsets
in the slice header.
Appending to the child stream doesn't work is the child is a leaf
slice state.
Simplifies flow by removing distinction between tile and slice. Now
that slice headers are written in the parent stream, there is zero
difference between tiles and slices from bitstream point of view.
Having some of the termination bits in the LCU coding and some in the
substream finalization was needlessly confusing. Doing substream
finalization directly after LCU coding makes it easy to verify that the
finalization is done correctly.
Removes one job per WPP row from the job queue.
Removes kvz_cabac_flush, because I don't like bits being put into the
bitstream implicitly. Better to have it all in the open.
Stops assuming that having cfg->gop_lowdelay set means that GOP
structure is used since it is possible that cfg->gop_lowdelay is true
but cfg->gop_len is zero. Adds checks for cfg->gop_len where needed.
Fixes a possible division by zero in kvz_encoder_feed_frame.
Adds field lcu_stats to encoder_state_config_frame_t. The following data
is recorded for each LCU:
- number of bits
- squared cost
- used lambda value
- alpha parameter used for rate control
- beta parameter used for rate control
When rate control is enabled, enable cu_qp_delta_enabled_flag in PPS
with diff_cu_qp_delta_depth set to 0. Also adds code for writing the QP
deltas and a new cabac context.
Adds fields lambda, lambda_sqrt and qp to encoder_state_t. Drops field
cur_lambda_cost_sqrt from encoder_state_config_frame_t and renames
cur_lambda_cost to lambda.
Write bitstream without chroma when encoding with --input-format=P400.
This reduces bitstream size by 0-1 %, compared to coding monochrome in
420 format, and speeds up encoding slightly due to not processing
chroma.
Changes encoder_set_source_picture to set the reconstructed picture to
a copy of the source picture instead of allocating a new picture when
lossless coding is used.
- Moves allocation of the reconstructed picture after the source picture
is set.
- Extracts main state initialization to a separate function from
encoder_state_new_frame.
- Changes kvz_encoder_feed_frame to return the frame.
- Renames some functions to better match their purpose.
When --lossless is given, set cu_transquant_bypass_flag for every CU and
bypass transform and quantization by directly copying reference pixels
to reconstruction and the residual to coefficients.
Enables search for 2NxN and Nx2N partition modes for 8x8 CUs and 2NxnU,
2NxnD, nLx2N and nRx2N partition modes for 16x16 CUs.
Changes the loop for copying reconstructed luma pixels in
kvz_inter_recon_lcu to use 4 byte chunks instead of 8 byte chunks since
it is now possible to have 4 pixel wide blocks.
This problem resulted in an illegal bitstream with --gop=lp, because it
uses IDR's. The --gop=8 would not code IDR pictures, even when told to
with -p, which masked this problem.
This fix solves the problem with --gop=lp and also prevents references
across the intra picture in --gop=8. The intra pictures should be set
to IDR in a later fix, or an alternate method of differentiating
between IDR and non-IDR intra should be made.
The includes should make more sense now and not just happen to compile
due to headers included from other headers.
Used a modified version of IWYU. Modifications were to attribute int8_t
and so on to stdint.h instead of sys/types.h and immintrin.h instead of
more specific headers.
include-what-you-use 0.7 (git:b70df35)
based on clang version 3.9.0 (trunk 264728)
I was a bit unclear about exactly what happens and when regarding SAO
and deblocking when we do frame-parallel WPP parallelism, so I checked
and commented the bits that were unclear to me.
There was an off by one error in the dependance setting code, which
resulted in dependencies not being set resulting in checksum errors.
For example if ref_neg=1 and owf=1.
Moves sao search from function encoder_state_worker_encode_lcu in
encoderstate.c to function kvz_sao_search_lcu in sao.c. Makes functions
kvz_init_sao_info, kvz_sao_search_chroma and kvz_sao_search_luma static
since they are no longer used outside sao.c.
CU data was being copied to the wrong place in the reference frames
cu_array, which led to uninitialized data being used as a starting
point for motion vector search.
Fixes#99.
Add dependency to the reference frame instead of the previous frame,
in order to allow more frames to be encoded in parallel when temporal
stepping >1 in LP-gop (such as --gop=lp-g8d4r1t2).
Prevents a conflict with config.h and src/config.h so that the config.h
generated by configure is included in global.h. Fixes problems with
large input files on 32-bit systems.
Add module information to all header files.
Update all header file documentations to briefly say what they are, and
to use the javadoc format so the brief actually gets included into the
doxygen documentation.
Remove \file from implementation files, in order to not repeat the info
from the header files.
Add files under strategies and tools to Doxygen and update the Doxygen
settings to be just plain better.
Make README be the main page of Doxygen documentation.
Remove the need to count the coefficients by populating the significant
coefficient group map first and finding the last coefficient from the
last group afterward. The speedup is about 2% on ultrafast.
The previous version of this patch was reverted due to a bug, which
has now been fixed.
This reverts commit 25462124f8.
That commit broke the bitstream. If it's not good enough to push on Friday
night, it's probably not good enough on Monday morning either.
Remove the need to count the coefficients by populating the significant
coefficient group map first and finding the last coefficient from the
last group afterward.
Changes main function to compute frame PSNR by calling
kvz_videoframe_compute_psnr directly with the source and reconstructed
pictures returned from encoder_encode.
The code for building the reference picture lists was duplicated in
functions encoder_state_ref_sort and print_frame_info. This commit moves
it to a new function kvz_encoder_get_ref_lists. Also makes
encoder_ref_insertion_sort static since it is not used outside the
encoderstate module any more.
This bug caused a single tiles worth of lcu_info_t structs to be copied
unnecessarily for every LCU in the frame. This obviously caused huge
memory bandwidth issues when coding large frames without tiles. The
effect was minimized somewhat with a large number of tiles, because
only the current tile was copied.
From context it is clear that this piece of code was supposed to copy
a single tile or frame, once the frame was done, but because it was
placed in a function which is called for every LCU, it copied the data
for the LCU, but also lots of extra stuff.
The fix is to copy only the current LCU instead of the whole tile.
A call to kvz_threadqueue_waitfor caused the tqj_bitstream_written field
of the previous encoder state to become a dangling pointer, subsequently
causing an assertion to fail. This would only occur when the encoder
state used for a new frame was not the last finished one.
Fixed by setting tqj_bitstream_written to NULL after the job is done and
removing unnecessary calls to kvz_threadqueue_waitfor.
- Removes all bitstream types.
- Changes encoder_encode to return the encoded data as list of chunks.
- Moves writing of the encoded data to the main function.
- Replaces read_one_frame by encoder_feed_frame.
- Adds field "prepared" to encoderstate_t to indicate that
encoder_next_frame has been called.
- Input frames are read in the main function and passed to
encoder_encode.