Limits video size so that the number of luma and chroma pixels can be
stored in an int. Fixes some integer overflows that resulted in
segmentation faults.
Finalization functions for frame and tile encoder states accessed the
frame and tile fields of the encoder state even though they might be
NULL. This is the case when the initialization of an encoder state
fails. Fixed by adding NULL checks.
Some of the fields in encoder_control_t were simply copies of the
corresponding fields in kvz_config. This commit drops the copied fields
in favor of using the fields in encoder_control_t.cfg directly.
The kvz_config struct is created by the user but kvazaar keeps a pointer
to it. It is easy to break things by modifying the configuration outside
kvazaar. In addition, kvazaar modifies the struct even though it is has
a const modifier.
This commit changes the field cfg in encoder_control_t to be a copy of
the kvz_config struct instead of a pointer, removing modifications to
the const struct and allowing users to do whatever they want with it
after opening the encoder.
The end of slice was being calculated incorrectly, which led to no tile
being created inside the slice, which led to an assert triggering.
This fixes the wrong end of slice calculation, but also disallows
wavefront rows from being created, if there would be only one.
The wavefront initialization code assumes there are always more than
one row, so the inter-frame dependency doesn't get added properly.
Fixes#153.
Main thread was stuck looping on pthread_cond_timedwait because
the abs time given on OS-X had already passed and the wait
returned immediately without releasing the mutex to allow worker
threads to proceed.
Fix was to use the gettimeofday, which returns real time instead
of monotonic, which is what pthread_cond_timedwait wants.
rdo.c:475:25: warning: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long long') but has parameter
of type 'int' which may cause truncation of value [-Wabsolute-value]
current.cost = -abs(quant_cost_in_bits) + (bits << PRECISION_INC);
^
rdo.c:475:25: note: use function 'llabs' instead
current.cost = -abs(quant_cost_in_bits) + (bits << PRECISION_INC);
cfg.c:1024:74: warning: format specifies type 'size_t' (aka 'unsigned
long') but the argument has type 'unsigned long long'
[-Wformat]
fprintf(stderr, "Too large ROI size: %llu (maximum %zu).\n", size, SIZE_MAX);
This encapsulates tiles or WPP rows into their own slices, making
it possible to send them as soon as they are done, instead of waiting
for the other substreams to finish and coding the substream offsets
in the slice header.
These fixes allow more than one slice to be used to code a picture.
- Use correct number of bits to code the slice segment address.
- Don't offset_len_minus1 for slices without substreams.
Appending to the child stream doesn't work is the child is a leaf
slice state.
Simplifies flow by removing distinction between tile and slice. Now
that slice headers are written in the parent stream, there is zero
difference between tiles and slices from bitstream point of view.
Having some of the termination bits in the LCU coding and some in the
substream finalization was needlessly confusing. Doing substream
finalization directly after LCU coding makes it easy to verify that the
finalization is done correctly.
Removes one job per WPP row from the job queue.
Removes kvz_cabac_flush, because I don't like bits being put into the
bitstream implicitly. Better to have it all in the open.
When the number of merge candidates was five, biprediction search would
read past the bounds of the priority list arrays. Fixed to limit the
search to the first four candidates.
- Checks the return value of fopen when opening the ROI file. Fixes
a segfault when the file cannot be opened.
- Check that the width and height are positive. Fixes reading past the
end of the delta QP array in kvz_set_lcu_lambda_and_qp.
- Check for overflow in width * height. Fixes an overflow resulting in
a segfault.
- Properly check that fscanf succeeds. Fixes silently accepting ROI
files that are too short.
- Properly close the FILE pointer.
The function search_pu_inter_ref incorrectly rounded the coordinates of
the block to down to a multiple 8 pixels. Small SMP and AMP blocks may
start at coordinates that are not multiples of 8. Fixed by removing the
rounding.
Fixes a failing assert when --mv-constraint is used with --smp or --amp.
Stops assuming that having cfg->gop_lowdelay set means that GOP
structure is used since it is possible that cfg->gop_lowdelay is true
but cfg->gop_len is zero. Adds checks for cfg->gop_len where needed.
Fixes a possible division by zero in kvz_encoder_feed_frame.
Drop a conditional for depth > MAX_DEPTH in search_cu. The depth cannot
be greater than MAX_DEPTH (== 3) since an earlier if-clause checks that
it is less than MAX_PU_DEPTH (== 4).
Subpixel motion estimation return 0-vector when no subpixel vector is
within the constraint. Fix is to not call subpixel motion estimation
when the integer vector is not within the constraint.
Rename and reorder everything to make more sense.
- Moved input tables into their own struct and renamed them to what
they actually represent.
- Renamed pretty much every variable to comform to our style and
to make sense.
- Removed the lastCG stuff, as the function already gets passed the
last coeff anyway. (it was named width, what the hell?)
This is to prepare for changing the code using the floating point table
to use the fixed point table instead.
This also allows reducing the size of the fractional part, which was
useful for finding every place where the the fixed point presentation
is relied upon.
Changes luma deblocking to use gather and scatter instead of reading
to and writing from here and there in memory. Should make them
faster and easier to vectorize, or at least cleaner.
Splits strong and weak luma deblocking to two functions, as they have
almost nothing in common.
Adds field lcu_stats to encoder_state_config_frame_t. The following data
is recorded for each LCU:
- number of bits
- squared cost
- used lambda value
- alpha parameter used for rate control
- beta parameter used for rate control
When rate control is enabled, enable cu_qp_delta_enabled_flag in PPS
with diff_cu_qp_delta_depth set to 0. Also adds code for writing the QP
deltas and a new cabac context.
Adds fields lambda, lambda_sqrt and qp to encoder_state_t. Drops field
cur_lambda_cost_sqrt from encoder_state_config_frame_t and renames
cur_lambda_cost to lambda.
- Defines MIN_LAMBDA and MAX_LAMBDA constants.
- Moves resetting state->frame->cur_gop_bits_coded to rate_control.c.
- Changes gop_allocate_bits to return the number of bits allocated like
pic_allocate_bits does.
When --threads=auto was given on the command line, cfg->threads was
actually set to zero, disabling threads altogether. Fixed to set
cfg->threads to -1, so that the number of threads is chosen
automatically.
The CABAC engine only writes to the bitstream when it has a full byte.
These writes are also always byte-aligned, so there is no need to even
check for stream alignment.
Speedup was around 3% with ultrafast and low QP.
Enforce bit depth promised by --input-bitdepth to avoid crashes when
larger values are provided.
Do endianess byte swap for all bytes when the buffer gets extended
to multiple of 8 pixels, and not just the number of input pixels.
Don't swap bytes on a little-endian system.
- Reduce indentation to 6 spaces
- Word wrap everything to under 80 characters
- Remove defaults from options covered by presets
- Add a dash in front of argument descriptions
- Add --(no-) to names of parameters that accept it and remove mention
of enabling or disabling
- Add executable and scripts as a dependancy to make docs
This value is not represented in the HEVC bitstream, which is why it
was not set previously. FFmpeg sets and needs it however, so make the
CLI set it as well to make sure we handle it correctly.
The rd-complexity of slow presets is better with a less agressive GOP.
Adding the GOP as part of the preset improved BDRate enough, that it
didn't make sense anymore to have a veryslow target the best BDRate.
Instead, push that responsibility to placebo by making it a little bit
faster.
GOPs with depth 1 had the same structure as those with depth 2:
g4d3t1 = 3 2 3 1
g4d2t1 = 2 2 2 1
g4d1t1 = 2 2 2 1
It now results in the correct:
g4d1t1 = 1 1 1 1
Coding inter without GOP of any kind really isn't a very sensible
default. Defaulting to B-GOP of some kind would be more better,
but lp-gop is more robust for now.