Commit graph

2386 commits

Author SHA1 Message Date
Ari Koivula ec7507a935 Further optimize get_ep_ex_golomb_bitcost
Unrolled 16-bit log2 calculation.
2016-08-30 21:37:01 +03:00
Ari Koivula a4ba794587 Optimize get_ep_ex_golomb_bitcost
Arrange the decision tree such that there is only 3 branches on the
most common paths and the more likely branch is always fall-through.

A profile guided optimization pass would probably do something similar.
2016-08-30 05:24:16 +03:00
Ari Koivula 82cfab58f8 Improve fast mvd coding cost estimation
A lot of time is being taken up by this function on ultrafast, and it
doesn't do a very good job. This change aims to both simplify the
logic and make the estimate better.

The logic is simplified by using a look up for the step mvd bit cost
step function instead of mimicking the binarization process. The
estimation is made better by checking fractional cabac bit costs.

The new function returns the same results as
kvz_get_mvd_coding_cost_cabac, but is also faster than the old
function.
2016-08-30 04:55:09 +03:00
Ari Koivula d31be8eb27 Make mvd_coding_cost functions take const cabac 2016-08-30 04:46:46 +03:00
Ari Koivula 64d631c174 Fix 8bit to 10bit input conversion regression 2016-08-25 22:09:40 +03:00
Ari Koivula 27789125d8 Fix input bit depth conversion
The input was being shifted to the wrong direction.
2016-08-25 22:05:25 +03:00
Ari Koivula 4ec039004b Add monochrome encoding
Write bitstream without chroma when encoding with --input-format=P400.
This reduces bitstream size by 0-1 %, compared to coding monochrome in
420 format, and speeds up encoding slightly due to not processing
chroma.
2016-08-25 20:15:26 +03:00
Ari Koivula c5b70cf812 Add chroma format support to yuv_t 2016-08-24 19:20:53 +03:00
Ari Koivula 032ed30ff4 Add chroma format support to kvz_picture
Add picture_alloc_csp to libkvz api to allocated pictures with chroma
format different from 420.
2016-08-24 19:20:53 +03:00
Ari Koivula 48ccc26839 Add --input-format and --input-bitdepth
Adds reading of 10 bit input for 10-bit encoding.
2016-08-24 19:20:53 +03:00
Ari Koivula cc08073615 Refactor some indexing weirdness in init_lcu_t
I thought there might be a bug in this so I cleaned it up.
2016-08-24 19:12:48 +03:00
Ari Koivula b6d674d66e Refactor integer vector inter prediction
This code was pretty bad, so I cleaned it up a bit.
2016-08-24 19:09:26 +03:00
Ari Lemmetti 28c4174d0e Fix incorrect shuffle parameters
_MM_SHUFFLE uses reverse order
2016-08-23 19:40:46 +03:00
Ari Lemmetti ce77bfa15b Replace KVZ_PERMUTE with _MM_SHUFFLE
The same exact macro already exists
2016-08-22 19:08:46 +03:00
Jovasa 68eef660bd Fixed search around mv_in in fullsearch not being saved. 2016-08-19 15:19:29 +03:00
Eemeli Kallio 99d8b9abeb Changed skip_rdoq name to kvz_skip_unnecessary_rdoq. Changed the order it uses when it goes through CGs and tuned its sum calculation. 2016-08-18 14:02:56 +03:00
Eemeli Kallio 1fb4755f31 Added rdoq-skip to quant-generic.c 2016-08-18 12:17:54 +03:00
Eemeli Kallio d20ac03ca2 Added --rdoq-skip option 2016-08-18 12:17:53 +03:00
Marko Viitanen 83cf801664 Fixed MV constraint condition in bipred 2016-08-18 08:53:17 +03:00
Marko Viitanen 5ae1c595f2 Fixed slice_temporal_mvp_enabled_flag and disabled TMVP with tiles
- slice_temporal_mvp_enabled_flag should be signalled also with non-IDR I-slices
2016-08-10 14:51:41 +03:00
Marko Viitanen 5326519182 TMVP cleanup and const qualifier fixes 2016-08-10 14:10:43 +03:00
Marko Viitanen f40907260d Added config parameter for TMVP and cmdline option --no-tmvp
- Enabled by default
 - Cannot be used with GOP at the moment
2016-08-10 14:09:29 +03:00
Marko Viitanen fd52dac1f7 Fixed TMVP scaling 2016-08-10 14:09:28 +03:00
Marko Viitanen c664bc8cf7 Added flag collocated_ref_idx to the slice header 2016-08-10 14:09:28 +03:00
Marko Viitanen c5f2611a38 Fixes for TMVP to work with the new CU array 2016-08-10 14:09:28 +03:00
Marko Viitanen d85af5755b TMVP working when only 1 ref frame 2016-08-10 14:09:28 +03:00
Marko Viitanen 39f0165efe Fix a bug in TMVP, the reference cu_array was being overwritten 2016-08-10 14:09:27 +03:00
Marko Viitanen adab8c327e Clean TMVP code 2016-08-10 14:09:20 +03:00
Marko Viitanen 5fa8226ac9 Temporal merge candidate selection 2016-08-10 14:09:20 +03:00
Marko Viitanen f83042f4a1 Temporal MV candidate selection 2016-08-10 14:09:19 +03:00
Marko Viitanen f8671581e3 Implemented function kvz_inter_get_temporal_merge_candidates() 2016-08-10 14:09:19 +03:00
Marko Viitanen 2956bdb379 Added flag slice_temporal_mvp_enabled_flag 2016-08-10 14:09:19 +03:00
Arttu Ylä-Outinen 2a946bd88e Rename encoder_state_t.global to frame
"Frame" is more accurate than "global" since when OWF is used, encoder
states for each frame have their own struct.
2016-08-10 13:22:36 +09:00
Arttu Ylä-Outinen 5fbb0a8c27 Fix includes 2016-08-10 13:05:40 +09:00
Arttu Ylä-Outinen aabf6ca3ee Extract encoding code from encoderstate.c
Moves functions kvz_encode_coding_tree and kvz_encode_coeff_nxn from
encoderstate.c to encode_coding_tree.c.
2016-08-09 22:16:50 +09:00
Arttu Ylä-Outinen 803f29be8f Remove reconstructed picture allocation in lossless.
Changes encoder_set_source_picture to set the reconstructed picture to
a copy of the source picture instead of allocating a new picture when
lossless coding is used.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen aaec473a19 Refactor encoder state initialization.
- Moves allocation of the reconstructed picture after the source picture
  is set.
- Extracts main state initialization to a separate function from
  encoder_state_new_frame.
- Changes kvz_encoder_feed_frame to return the frame.
- Renames some functions to better match their purpose.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen cd7024b3a5 Skip computing SSD when using lossless coding.
The SSD is always zero since it is lossless.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen fbbe5d1844 Use kvz_pixels_calc_ssd for SSD in search.c.
Replaces loops for computing SSDs by calling kvz_pixels_calc_ssd in
search.c.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 22cc97ffb1 Fix missing field initializers. 2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 06b82bf888 Disable filters, trskip and signhide in lossless.
When lossless coding is used, deblock and SAO are skipped, transform
skip flag is not written and sign hiding is not used.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 97451ec401 Align assignments in encoder.c. 2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 1dc94663c3 Bypass transform and quantization with --lossless.
When --lossless is given, set cu_transquant_bypass_flag for every CU and
bypass transform and quantization by directly copying reference pixels
to reconstruction and the residual to coefficients.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 2113b0182d Enable PPS-level tq bypass flag with --lossless.
Sets transquant_bypass_enable_flag to true in PPS when --lossless is
given.
2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen a5897bbece Make cabac context initialization tables static. 2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 23e7d9bb37 Add --lossless command line parameter. 2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen 5372ea432f Update README and manpage. 2016-08-03 14:25:08 +09:00
Ari Lemmetti 6bcba004ff Comment out to fix unused code error on clang. 2016-07-14 14:12:16 +03:00
Ari Lemmetti c0979ebdcb Implement AVX2 luma sampling 2016-07-14 12:53:02 +03:00
Ari Lemmetti 6244560426 Add avx2 strategy for kvz_filter_frac_blocks_luma. 2016-07-14 12:53:02 +03:00
Ari Lemmetti 9c4e9e049b Load only what is needed. Eliminate latency from hadds. 2016-07-14 12:53:01 +03:00
Ari Lemmetti 7f71cb423a Check 4 fractional pixel positions simultaneously 2016-07-14 12:52:24 +03:00
Ari Lemmetti ad445ab8a1 Transition to kvz_filter_frac_blocks_luma 2016-07-14 12:51:02 +03:00
Ari Lemmetti fccfbd2f28 Add strategy for kvz_filter_frac_blocks_luma 2016-07-14 12:51:02 +03:00
Ari Lemmetti e9c3074d32 Add buffers and definitions for upcoming filtering
Samples are to be filtered in separate blocks instead of
making one big picture with interpolated pixels
2016-07-14 12:51:02 +03:00
Ari Lemmetti 7afe7e963b Use fme_level to control the search accuracy. 2016-07-14 12:51:01 +03:00
Ari Lemmetti 5fa323bf25 Skip searching best hpel twice. Make hpel and qpel loops similar. 2016-07-14 12:51:01 +03:00
Ari Lemmetti bc98a9affa Change the search order to suit lighter fme search 2016-07-14 12:51:01 +03:00
Ari Lemmetti 2b0c8db349 Add quad satd for avx2 2016-07-14 12:50:24 +03:00
Ari Lemmetti 0ff69fd6f8 Add any size multi satd 2016-07-14 12:48:37 +03:00
Ari Lemmetti d17b9e7d6e Allow subme parameters 0-4
Update usage, presets,defaults,lib version
2016-07-12 19:49:38 +03:00
Arttu Ylä-Outinen 62ad57d0bf Fix kvz_image_list_add for zero-sized lists.
When a list does not have space for the new element, its size is
doubled. If the size of the list is zero, it would not be resized. Fixed
to always resize the list so that the new element can be added.
2016-06-22 13:35:16 +09:00
Arttu Ylä-Outinen 433e528af7 Drop unused variable in search_pu_inter.
Removes unused variable max_px_below_lcu.
2016-06-22 13:35:16 +09:00
Arttu Ylä-Outinen 7836ff6ec9 Drop unused functions.
Removes functions kvz_coefficients_calc_abs, kvz_intra_rdo_cost_compare
and kvz_rdo_cost_intra which are no longer used.
2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen e4b5840f56 Add parentheses around macro arguments in cabac.h. 2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen a387b74e51 Fix resolution auto-detection.
Only try to guess the resolution from filename when neither width nor
height is given.
2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen 097bf8f3c0 Add a typedef for mvd coding cost functions. 2016-06-20 13:56:10 +09:00
Arttu Ylä-Outinen d3c0e49286 Update comments. 2016-06-16 20:25:08 +09:00
Arttu Ylä-Outinen ae832cda8c Pack cbf flags in cu_info_t to two bytes.
Reduces size of cu_info_t.
2016-06-16 20:24:19 +09:00
Arttu Ylä-Outinen cad2d496b8 Enable 4x8 and 4x16 partition modes
Enables search for 2NxN and Nx2N partition modes for 8x8 CUs and 2NxnU,
2NxnD, nLx2N and nRx2N partition modes for 16x16 CUs.

Changes the loop for copying reconstructed luma pixels in
kvz_inter_recon_lcu to use 4 byte chunks instead of 8 byte chunks since
it is now possible to have 4 pixel wide blocks.
2016-06-16 20:23:16 +09:00
Arttu Ylä-Outinen 90df7350f0 Make deblocking work with 4 pixel wide blocks. 2016-06-16 20:21:50 +09:00
Arttu Ylä-Outinen bf26661782 Add support for 4x4 blocks to SATD_ANY_SIZE.
Makes functions satd_any_size_generic and satd_any_size_8bit_avx2 work
on blocks whose width and/or height are not multiples of 8.
2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen 2ae260e422 Change width of cells in lcu_t to 4 pixels.
Intra mode info for NxN partition units is now stored in the
corresponding 4x4 cell in lcu_t.cu array.
2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen 360f5bb8da Always use pixel coordinates for indexing lcu_t.
Removes macro LCU_GET_CU and uses LCU_GET_CU_AT_PX in its place.
2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen 46e8122d27 Add functions for indexing cu_array_t structures.
Replaces macro CU_ARRAY_AT with functions kvz_cu_array_at and
kvz_cu_array_at_const.
2016-06-16 18:52:19 +09:00
Arttu Ylä-Outinen c5afabdd3b Change width of cells in cu_array_t to 4 pixels. 2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen 57a3d9b4b9 Add a function for copying CU data from LCUs.
Adds function kvz_cu_array_copy_from_lcu which CU info data from an
lcu_t structure to a cu_array_t structure.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen 2c85a00a55 Change kvz_cu_array_alloc to use pixel dimensions.
Changes function kvz_cu_array_alloc to take width and height parameters
in pixels instead of SCUs.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen b276a347c0 Add a macro for indexing cu_array_t.
Adds macro CU_ARRAY_AT(cu_array, x, y) to cu.h.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen 8ac1f1986e Move CU array copy to a separate function.
Moves code for copying parts of cu_array_t to a new function
kvz_cu_array_copy in cu module.
2016-06-15 12:25:11 +09:00
Arttu Ylä-Outinen 41e75daed7 Fix overlapping memcpy in kvz_search_cu_smp.
The destination and source pointers might be equal. Fixed by replacing
the memcpy call with a simple assignment.
2016-06-15 12:25:11 +09:00
Ari Lemmetti 29af8bcd21 Remove const to match function signature 2016-06-14 18:19:40 +03:00
Eemeli Kallio 5af6ab320c Merge branch 'me_early_terminate'
Conflicts:
	configure.ac
	src/cfg.c
	src/cli.c
	src/kvazaar.h
	src/search_inter.c
2016-06-14 15:03:35 +03:00
Eemeli Kallio 43c7778b82 Updated version number. 2016-06-14 10:53:04 +03:00
Arttu Ylä-Outinen 23fdeeaf10 Move mv_cand and mv_dir into a bitfield in cu_info_t.
Reduces size of cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen 35aadf6776 Reduce size of type in cu_info_t to two bits.
Reduces size of cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen 1cbe844f79 Move inter and intra into an union in cu_info_t.
Reduces size of cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen b6d793ef33 Drop field inter.mvd from cu_info_t
Instead of storing the mv differences in cu_info_t, they are computed
from the mv candidates and the motion vector. Reduces the size of
cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen 98aa906f30 Drop field coded from cu_info_t
It can be inferred from the position and size of the CU.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen ebb10763f1 Drop field inter.mv_ref_coded from cu_info_t.
Storing inter.mv_ref_coded in cu_info_t is unnecessary since it can be
computed from refmap and inter.mv_ref.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen 4be5c8f349 Move flags into a bitfield in cu_info_t.
Reduces the size of cu_info_t.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen 30e9ee988d Move bitcost field out of cu_info_t.inter.
The bitcost is only needed for the currently searched CU.

Fixes bitcost of the second PU being ignored when using SMP or AMP.
2016-06-14 12:21:57 +09:00
Arttu Ylä-Outinen 16d13ed046 Move cost field out of cu_info_t.inter
The cost is only needed for the currently searched CU.
2016-06-14 12:20:05 +09:00
Arttu Ylä-Outinen c5c2c182d9 Drop unused field mode from cu_info_t.inter. 2016-06-14 12:18:17 +09:00
Eemeli Kallio e4f1a74512 Added early termination option for motion estimation.
Conflicts:
	src/search_inter.c
2016-06-13 16:20:35 +03:00
Wassim Hamidouche 5bc7287c67 add fix for crypro 2016-06-09 10:49:31 +03:00
Wassim Hamidouche 35634b5596 correct MV sign encryption 2016-06-09 10:49:31 +03:00
Wassim Hamidouche 15abdc6e81 correct sign encryption 2016-06-09 10:49:31 +03:00
Wassim Hamidouche 73c3203a26 encry coef transfs 2016-06-09 10:49:31 +03:00
Wassim Hamidouche 7ad5f8bbe5 encry coef transf sign 2016-06-09 10:49:31 +03:00
Wassim Hamidouche 02b0712973 fix g++ compilation 2016-06-09 10:48:44 +03:00
Ari Koivula a2170f0763 Compile the cryptopp wrapper only when used
This should allow us to avoid an unnecessary dependancy to a C++
compiler.

Conflicts:
	configure.ac
2016-06-07 17:11:12 +03:00
Ari Koivula 182038c743 Don't allow enabling encryption when it's not compiled in 2016-06-07 16:58:09 +03:00
Ari Koivula 8eb087120e Make VisualStudio ignore the crypto stuff
Add stubs for the crypto functions so we can refer to them, even if we
never use them.
2016-06-07 16:58:09 +03:00
Wassim Hamidouche 76cb6dc6c2 add check flags 2016-06-07 10:54:26 +02:00
Ari Koivula 60ea8a359f Add --crypto parameter 2016-06-07 10:31:40 +02:00
Wassim Hamidouche 02308d1ba6 add MVs encryption 2016-06-07 10:28:30 +02:00
Wassim Hamidouche 4637c8a828 compile Kvazaar encoder with ITpp library 2016-06-07 08:33:04 +02:00
Eemeli Kallio 8f182ac6de Added functions select_starting_point and mv_in_merge to search_inter.c 2016-06-06 17:16:04 +03:00
Ari Koivula fe71638a96 Fix problem with ASM compilation
When compiling C++ files along with C, libtool would complain about
the --tag missing, even though CC should be the default.
2016-06-06 15:47:56 +03:00
Eemeli Kallio 836a3b1daa Added functions select_starting_point and mv_in_merge. 2016-06-06 12:18:33 +03:00
Ari Koivula 4eaacbe23e Fix bug with lp-gop and ratecontrol
The first frame was always qp51 due to gop_offset being -1 for the
first frame. This fix makes it so that bits are allocated as if it was
the last (high quality) frame from the previous GOP.
2016-05-27 15:53:55 +03:00
Ari Koivula 3fbd7ed97f Add GOP layer weights for lowdelay-P
When using ratecontrol with lowdelay-P, this improves BDRate by 1-25%.
Strongest effect is when using 4 layers and multiple references.

Also allow using 1 or 2 layers with ratecontrol.
2016-05-27 13:46:26 +03:00
Ari Koivula 67acead4bc Fix referring over IDR boundary when using --gop
This problem resulted in an illegal bitstream with --gop=lp, because it
uses IDR's. The --gop=8 would not code IDR pictures, even when told to
with -p, which masked this problem.

This fix solves the problem with --gop=lp and also prevents references
across the intra picture in --gop=8. The intra pictures should be set
to IDR in a later fix, or an alternate method of differentiating
between IDR and non-IDR intra should be made.
2016-05-27 13:20:53 +03:00
Ari Koivula a77dc1610e Refactor encoder_state_remove_refs
I needed to debug this, so I rewrote it to make sense. There is an
obvious bug with the IDR handling that I left in place to fix in a
separate commit.
2016-05-27 13:20:45 +03:00
Eemeli Kallio b5c05e58e0 Fixed typo in strategyselector.c 2016-05-24 11:04:29 +03:00
Ari Lemmetti 68c6f0f7b8 Enable deblocking for every preset
Deblocking adds very little complexity
while giving massive coding performance boost
2016-05-17 18:50:31 +03:00
Ari Lemmetti 6a07761b46 Add smp and amp options to presets 2016-05-17 14:26:58 +03:00
Ari Lemmetti 3107a93eaf Fix avx2 chroma sampling for amp 2016-05-17 14:09:57 +03:00
Ari Koivula 24d0f9f685 Fix usage message for --hash 2016-05-11 15:03:43 +03:00
Ari Koivula a1c772b696 Merge pull request #136 from MrAsura/cu-split-termination
Cu split termination

Closes #133.
2016-05-10 17:22:08 +03:00
Jaakko Laitinen 7010526b1d Removed tabs. 2016-05-10 15:52:44 +03:00
Jaakko Laitinen a77eb5c874 Fixed type conversion error when parsing cu split termination. 2016-05-10 14:34:46 +03:00
Jaakko Laitinen 0d361d5bc7 Moved cu split termination from a pre-processor to a input parameter. 2016-05-10 14:15:41 +03:00
Ari Koivula 1dbe4eb852 Merge branch 'mv-full' 2016-05-10 13:28:07 +03:00
Ari Koivula f6a9d237a3 Merge pull request #134 from miimiz/testink_eemeli
Strategyselector prints
2016-05-10 13:27:23 +03:00
Eemeli Kallio 8cfeed852c Added print about SIMD optimizations available and in use to strategyselector. 2016-05-10 12:59:15 +03:00
Ari Koivula f51a68b6fa Add different sizes of search window for full search 2016-04-21 15:11:35 +03:00
Ari Lemmetti efbdc5dade Utilize registers more efficiently for 8x8 and larger blocks 2016-04-21 13:26:38 +03:00
Ari Lemmetti 192cee95b2 Vectorize vertical filtering 2016-04-21 13:26:38 +03:00
Ari Lemmetti 0be35f72b8 Filter 4 pixels simultaneously in x direction 2016-04-21 13:26:38 +03:00
Ari Lemmetti 10484bda9f Make strategies out of fractional pixel sample functions 2016-04-21 13:26:38 +03:00
Ari Koivula 28e7548387 Fix bug in full mv search
This optimization led to some points not being searched.
2016-04-21 12:03:57 +03:00
Ari Koivula 2576aeee0b Use merge candidates in full mv search
Perform a full search window around every mv candidate and the
0-vector.
2016-04-20 20:47:11 +03:00
Ari Lemmetti 8247faf8e0 Remove 64-bit only instruction to fix 32-bit compilation. 2016-04-19 18:05:11 +03:00
Ari Lemmetti eb55d6b6b9 Fix writing over boundary. 2016-04-19 16:03:43 +03:00
Ari Lemmetti bcabc6fadd Remove pixel blit from strategies. Use memcpy instead. 2016-04-06 18:44:04 +03:00
Ari Lemmetti 2140197ccc Tidy up coeff blit function and use memcpy again.
Give memcpy constants for fixed sizes to enable copying many bytes simultaneously.
2016-04-06 18:03:00 +03:00
Ari Koivula 08b4480d94 Re-add time.h include
Include-what-you-use wants to include sys/time.h instead, or if I
override it to include time.h it will remove the include completely.
2016-04-02 19:05:16 +03:00
Ari Koivula 61fc3e87ba Run include-what-you-use fix_includes.py fix_includes.py
The includes should make more sense now and not just happen to compile
due to headers included from other headers.

Used a modified version of IWYU. Modifications were to attribute int8_t
and so on to stdint.h instead of sys/types.h and immintrin.h instead of
more specific headers.

include-what-you-use 0.7 (git:b70df35)
based on clang version 3.9.0 (trunk 264728)
2016-04-01 17:46:55 +03:00
Ari Koivula 016810d982 Move COMPILE_ macro to global.h
While these are only used for strategies, it's non-intuitive to have
to include strategyselector.h in every file under strategies before
including anything else.
2016-04-01 17:46:55 +03:00
Ari Koivula 8908d85d66 Change all relative includes to absolute 2016-04-01 17:46:44 +03:00
Ari Koivula 4876879b82 Add IWYU pragmas 2016-03-31 12:33:34 +03:00
Marko Viitanen 41a5f9bbbe Fix filetime conversion to timespec 2016-03-24 10:08:11 +02:00
Ari Koivula 9139e169fe Fix unnecessary waiting in main thread
The main thread has to wait for the worker threads to finish. The
pthread_cond_timedwait call used to accomplish this was given
a relative instead of absolute time, which resulted in the call
returning immediately, because the time had already passed.

This removes the now unnecessary sleeps and fixes the time given to
the pthread_cond_timedwait such that it now waits until a job finishes
or 100ms have passed.
2016-03-23 22:23:04 +02:00
Ari Koivula e23ed231fb Fix race condition with owf and non-square motion partitions
The OWF wpp limit code assumed square blocks, and as such did not work
correctly when height != width. This changes the relevant code to consider
both height and width.
2016-03-22 16:46:38 +02:00
Arttu Ylä-Outinen d6a3e02f16 Fix calculating reference CU index in inter search
Fixes a possible segfault when SMP or AMP blocks are used.
2016-03-22 12:55:58 +02:00
Ari Lemmetti f4538ab474 Copy pixels more efficiently in lcu recon. 2016-03-18 20:10:03 +02:00
Ari Koivula 5b66578f71 Add kvz_ prefix to md5 functions
The non kvz_ symbols were being exported in the static lib, which got caught
by Travis tests.
2016-03-18 13:13:35 +02:00
Ari Koivula 4125218cfa Add --hash=md5
Add md5 through extras/libmd5 taken from HM with BSD license. It's
implemented as a generic strategy using the same interface as checksum,
so we can write a SIMD version if it seems necessary.
2016-03-18 05:23:57 +02:00
Ari Koivula 883448b8fb Add --hash parameter
Allows decoded picture hash to be selected among none and checksum.
2016-03-18 05:20:15 +02:00
Ari Lemmetti 6d5f8e3aec Define KVZ_COMPILE_ASM for the correct files.
Enables asm strategies again.
2016-03-17 16:21:31 +02:00
Ari Lemmetti e502292ba8 Remove old function 2016-03-16 20:18:55 +02:00
Ari Lemmetti c6cc96f5ec Optimize sao band ddistortion 2016-03-16 20:16:00 +02:00
Ari Lemmetti ab577f476f Optimize sao reconstruct color 2016-03-16 20:15:32 +02:00
Ari Lemmetti 48bfddf4ec Optimize calc sao edge dir 2016-03-16 20:14:50 +02:00
Ari Lemmetti ba69992941 Optimize sao edge ddistortion 2016-03-16 20:14:19 +02:00
Ari Lemmetti 941b6b3e27 Optimize calc eo cat 2016-03-16 20:13:30 +02:00
Ari Lemmetti 04fbb48a09 Add strategy for avx2. Copy generic functions there. 2016-03-16 20:13:15 +02:00
Ari Lemmetti 4e30a215d8 Create generic strategy for sao. 2016-03-16 20:11:15 +02:00
Ari Koivula 6f431e510c Comment and tidy threadqueue_worker
Carefully avoided making any changes to the logic.
2016-03-14 20:08:04 +02:00
Ari Koivula 1165ae2e1f Increase --mv-constraint=frametimemargin margin
Increase the margin to be 4 luma pixels to every direction.
2016-03-14 16:02:54 +02:00
Arttu Ylä-Outinen 0eda28ced6 Fix Visual Studio warnings
Initialization of a struct with addresses of local variables generated
warning C4221 in encmain.
2016-03-14 14:12:21 +02:00
Ari Koivula e91ca74733 Refactor kvz_encode_last_significant_xy 2016-03-10 18:47:16 +02:00
Ari Koivula 1fc0e8076c Format kvz_encode_last_significant_xy whitespace 2016-03-10 18:17:45 +02:00
Ari Koivula df9a958ef2 Merge branch 'log2' 2016-03-10 18:16:41 +02:00
Ari Koivula 4112a4364d Remove g_to_bits table 2016-03-10 15:59:51 +02:00
Ari Koivula 9fcfba637f Remove duplicated inline functions 2016-03-10 15:28:31 +02:00
Ari Koivula e27ec2cc53 Add kvz_math.h for common inline math functions
Calling it just math.h would have prevented including system math.h.
2016-03-10 15:26:18 +02:00
Ricardo Constantino c515796a21 Only use version prefix in kvazaar binary
Fixes regression since 54f08f2 causing libkvazaar version checks to not
work (i.e. pkg-config)
2016-03-09 16:13:59 +00:00
Arttu Ylä-Outinen 54f08f2bdb Use output of git describe as version. 2016-03-09 15:04:29 +02:00
Ari Koivula f8edf28161 Fix const qualifier warning
Also set the warning to an error in VS.
2016-03-09 14:16:15 +02:00
Ari Koivula b0c3ece31e Fix race condition when deblocking is on but SAO is off
Already suspected this yesterday, but didn't want to add the code to
handle it before confirming that it's actually a problem. It is.
2016-03-09 14:02:46 +02:00
Ari Koivula 1671725c72 Fix non-determinism issue with OWF WPP margin
The previous reasoning used deblocking and fractional motion estimation
together to arrive at a margin of 4 pixels. This was wrong, and with
either of these off, half pixel chroma interpolation could use pixels
outside the intended region.

Deblocking does not currently affect the margin needed.
2016-03-08 20:18:38 +02:00
Ari Koivula 674bfa14ce Comment WPP deblocking and SAO
I was a bit unclear about exactly what happens and when regarding SAO
and deblocking when we do frame-parallel WPP parallelism, so I checked
and commented the bits that were unclear to me.
2016-03-08 19:39:04 +02:00
Ari Koivula aec152c953 Fix OWF mv restriction limit
The check was done in regard to the wrong dimension, allowing the
access to unfinished parts of the frame when coding multiple frames
at the same time.
2016-03-08 17:12:43 +02:00
Ari Koivula fda103aa7c Refactor cfg->tiles_width_count and cfg->tiles_height_count
Change code everywere so these actually mean "width count" and not
"width count minus one".
2016-03-07 17:29:15 +02:00
Ari Koivula a350eb3a1e Fix --tiles to have the correct number of tiles.
The tiles_width_count etc. actually mean "count minus one".
2016-03-07 17:24:31 +02:00
Ari Koivula 49ea2d7b7f Fix --mv-constraint=frametile
Option --mv-constraint=frametilemargin was being used instead of
frametile.
2016-03-07 16:41:00 +02:00
Ari Koivula 95b8dd99f6 Add --tiles parameter
Add new parameter --tiles that accept only uniform split. I considered
supporting the syntax of --tiles-width-split for this, but writing
--tiles=u2xu2 is just not as intuitive as --tiles=2x2, and there is
hardly ever any reason to use anything but uniform split. The more
cumbersome --tiles-width-split and --tiles-height-split parameters
are still there to allow finer control.
2016-03-07 16:33:51 +02:00
Ari Koivula fd34dd9bc6 Fix race condition with OWF
There was an off by one error in the dependance setting code, which
resulted in dependencies not being set resulting in checksum errors.
For example if ref_neg=1 and owf=1.
2016-03-07 13:38:23 +02:00
Ari Koivula 81b439f4da Optimize starting point selection in tz
Avoid checking zero motion vectors multiple times. The merge candidate
list often has only one or two candidates, the other being zeroes.
2016-03-04 16:48:46 +02:00
Ari Koivula 2436702c27 Optimize starting point selection in hexbs
Avoid checking zero motion vectors multiple times. The merge candidate
list often has only one or two candidates, the other being zeroes.
2016-03-04 16:48:12 +02:00
Ari Koivula 5327b59b45 Remove KVZ_PERF_SEARCHPX
It's too invasive and we don't really need it.
2016-03-04 16:48:12 +02:00
Arttu Ylä-Outinen 348ac4888b Fix calc_mode_bits.
The CUs left and above the current one would be set to NULL when there
was only one CU between the current one and the left or top edge of the
frame.
2016-03-04 14:08:35 +02:00
Ari Koivula 86219aa0fc Fix non-determinism with tiles
Earlier fix that fixed the supply side of the cu_array to take tile
coordinates into account should have been accompanied with this one
that does the same thing to demand side.
2016-03-03 17:39:20 +02:00
Arttu Ylä-Outinen 626b53ce85 Move sao search from encoderstate to sao.
Moves sao search from function encoder_state_worker_encode_lcu in
encoderstate.c to function kvz_sao_search_lcu in sao.c. Makes functions
kvz_init_sao_info, kvz_sao_search_chroma and kvz_sao_search_luma static
since they are no longer used outside sao.c.
2016-03-01 14:56:16 +02:00
Ari Koivula cfa722e448 Reduce parallelism for tiles
There is still some race-condition with encoding tiles from multiple
frames, so disable this to keep the bitstream deterministic.
2016-02-29 20:20:21 +02:00
Ari Koivula 3dcc0957f8 Deal with impossible mv constraints
If 0,0 vector is illegal, it's possible that no legal movement vector,
is found, in which case a large cost is returned instead. The cost
overflowed and there is all sorts of silliness with converting from
double to int, but I'm not going to fix all of it because when we
remove the doubles it will all get fixed.
2016-02-29 19:18:14 +02:00
Ari Koivula b1adf1576a Add --mv-constraint=frametilemargin
Add an even stricter motion vector constraint to prevent motion vectors
to fractional pixel positions that would need pixels outside the tile.
2016-02-29 19:18:14 +02:00
Ari Koivula f808cbf608 Allow increased parallelism for tiles
When movement vectors are constrained to tiles, only the same tile in
previous frame needs to be depended upon.
2016-02-29 14:33:06 +02:00
Ari Koivula f4ebff12b0 Combine tile mv constraint with OWF mv constraint
This also fixes movement vectors in tiles when OWF is on. The OWF mv
constraint assumed WPP, so it didn't work with tiles.
2016-02-29 14:33:06 +02:00
Ari Koivula 7981609cd0 Add --mv-constraint=frametile 2016-02-29 14:33:06 +02:00
Ari Koivula 9dbbb7fdbc Add --mv-constraint argument 2016-02-29 14:33:06 +02:00
Ari Koivula 1be877faf9 Fix chroma reconstruction with tiles
An incorrect frame boundary check caused a checksum error, because the
chroma reconstruction of the encoder was wrong. The encoder treated
horizontal tile boundaries as frame boundaries when the vertical
component of the movement vector was a multiple of 8.
2016-02-29 14:32:51 +02:00
Ari Koivula c0dc490dd1 Fix inter non-determinism with tiles
CU data was being copied to the wrong place in the reference frames
cu_array, which led to uninitialized data being used as a starting
point for motion vector search.

Fixes #99.
2016-02-26 17:05:04 +02:00
Ari Koivula 719d72925b Add loop-input option
This option is useful for testing long encodes, as you don't have to
find an actual infinite input.
2016-02-18 20:00:55 +02:00
Ari Koivula d23a5a15f1 Fix overflow in rate control
A 32 bit int overflowed after 2^31 bits (2Gb). It will still overflow
eventually, after 500 years of outputting 1Gb/s, but by that time,
I recon we will have fixed this properly and it's time to upgrade.
2016-02-18 16:48:21 +02:00
Ari Koivula eeafe14946 Clean up search initialization
Copy lcu explicitly instead of initializing with the same parameters.
2016-02-17 14:57:31 +02:00
Arttu Ylä-Outinen e5c84c361c Eliminate a race condition with input thread.
Changes communication between the input thread and main thread in
encmain.c so that only one of them uses img_in and retval at a time.
Fixes a race condition which would sometimes result in a deadlock.
2016-02-17 12:09:19 +02:00
Ari Koivula c40ede56ad Allow more frame parallelism in LP-gop
Add dependency to the reference frame instead of the previous frame,
in order to allow more frames to be encoded in parallel when temporal
stepping >1 in LP-gop (such as --gop=lp-g8d4r1t2).
2016-02-05 17:08:24 +02:00
Arttu Ylä-Outinen 40c7198f7d Add a script for updating README
Adds script tools/update_readme.sh for regenerating the "Using Kvazaar"
section of README.md from the output of "kvazaar --help".
2016-02-05 16:21:39 +02:00
Arttu Ylä-Outinen aac5373095 Fix typos in documentation
Fixes a few typos in README and command line help.
2016-02-05 16:21:27 +02:00
Ari Koivula a4915dc547 Update man and README 2016-02-04 14:16:58 +02:00
Ari Koivula e941e21cd6 Enable errors about non-existing CLI options
Set opterr and optind to their normal default values.
2016-02-04 13:48:58 +02:00
Ari Koivula 7a4bf94a52 Add --version and --help
Also don't print help by default, because it's too long. Print a
shorter usage message instead.
2016-02-04 13:48:48 +02:00
Ari Lemmetti 99e37ec235 Update old pixel type to the current one 2016-01-30 19:33:09 +02:00
Ari Koivula c76a0951cf Change version to 0.8.3 2016-01-28 21:21:02 +02:00
Ari Koivula cb2121b1aa Double time scale when field coding is used 2016-01-28 21:04:52 +02:00
Ari Koivula 8ad7d2a714 Move interlacing stuff to libkvazaaar API
This moves the interlacing from CLI code to api->encoder_encode, in
order to make it possible to use field coding through the lib API.

The field order is now determined per frame, as FFmpeg gives it per
frame and it's signaled per frame.

As a side effect, the CLI also now prints info from frames instead of
fields. While we might want to extend the API in the future to allow
printing of more detailed information about fields, for now it's
more important that the CLI uses the real lib API.

PSNR calculation for interlaced frames disabled until we have a way to
avoid deinterlacing the frame when it's not necessary.
2016-01-27 15:29:45 +02:00
Ari Koivula 6952f0fcc6 Refactor interlaced reading
Doesn't change the way it works. Just rearranges things so it's easier
to see what is going on.
2016-01-26 13:42:41 +02:00
Ari Koivula a46351efe1 Fix out of bounds error in interlacing
When field height was padded to a multiple of 8, yuv_io_extract_field
would read outside the buffer.
2016-01-26 13:41:52 +02:00
Arttu Ylä-Outinen 49677810b5 Rename config module to cfg.
Prevents a conflict with config.h and src/config.h so that the config.h
generated by configure is included in global.h. Fixes problems with
large input files on 32-bit systems.
2016-01-25 12:26:46 +02:00
Marko Viitanen 8e6c12b859 Merge branch 'input_reading_thread' 2016-01-25 12:00:03 +02:00
Marko Viitanen b4a4ce848c Use field parity for extracting correct fields from the interlaced picture 2016-01-25 10:58:12 +02:00
Marko Viitanen 441ce7728f Fix for input_read_thread() in the case when interlaced source-scan-type is used 2016-01-25 10:57:51 +02:00
Marko Viitanen 198204a20a Fix when using --source-scan-type=bff, offset was used for output lines 2016-01-25 10:13:51 +02:00
Ari Koivula 22b8ed43dc Remove global.h include from kvazaar.h
It shouldn't have been put there as it's the lib interface.
2016-01-22 15:23:34 +02:00
Ari Koivula 249c88011e Fix problem with >2GB input files on 32bit 2016-01-22 15:15:02 +02:00
Ari Koivula fa1af14637 Fix includes to include global.h first everywhere 2016-01-22 15:07:49 +02:00
Ari Koivula 3bf278529c Fix interlacing when using lib interface
Some flags used for interlacing were set in CLI interface, which
meant that interlacing didn't work correctly when used through
libkvazaar.
2016-01-22 14:35:20 +02:00
Marko Viitanen 0128ee26e7 Clear img_in pointer after reading it 2016-01-22 14:29:35 +02:00
Marko Viitanen b5459c1f23 Fixed performance monitoring by adding KVZ_ prefix to GET_TIME 2016-01-22 11:27:25 +02:00
Marko Viitanen e36237335e Fixed memory leaks caused by the input handler thread and cleaned up the code 2016-01-22 11:27:25 +02:00
Marko Viitanen ad9a1f6539 Input thread implementation
- Handle input processing in a separate thread to allow main thread more time with thread handling etc
  - Significant speedup can be seen when run on ultrafast settings and on a system with great number of cores
2016-01-22 11:27:25 +02:00
Ari Koivula 5e734593c0 Add psnr argument to CLI
To disable calculation of PSNR for frames, printing 0.0dB instead.
2016-01-21 15:08:34 +02:00
Ari Koivula 9eba3a83cc Add compiler flag checking to configure 2016-01-20 16:32:34 +00:00
Arttu Ylä-Outinen d452709795 Fix compiling AVX2 strategies.
Option -mavx2 was omitted when compiling AVX2 strategies. This commit
moves strategies to convenience libraries so that their compilation
flags can be easily set and adds -mavx2 to CFLAGS of the AVX2 library.
2016-01-20 11:04:12 +02:00
Ari Koivula 8060e2f6ec Delete kvazaar_version.h
It's not used anymore.
2016-01-19 20:40:35 +02:00
Ari Lemmetti 44656aeb19 Remove useless calculation 2016-01-19 16:35:16 +02:00
Marko Viitanen e822c16659 Removed unneeded cpu flags causing compiling to fail on powerpc, closes #121 2016-01-18 08:55:32 +02:00
Ari Koivula c8c0b4e8e8 Change version number for v0.8.2 2016-01-15 19:42:07 +02:00
Ari Koivula e2402c0000 Remove kva_api_get versioning.
We have soname versioning now, so we should focus on getting that right
instead. This also serves as an example of correctly incrementing the
lib-version.
2016-01-15 19:39:24 +02:00
Ari Koivula caf809f26d Remove scons build scripts
Because we are not going to maintain them.
2016-01-15 17:35:35 +02:00
Ari Koivula 15e1110997 Remove reference to Makefile-old
Makefile-old was deleted and this reference breaks make dist.
2016-01-15 17:32:54 +02:00
Ari Lemmetti a9decd2f40 Bump for yet another release 2016-01-14 23:23:11 +02:00
Ari Koivula 7718ac378f Add fractional FPS support.
Now that we put the timing info into the bitstream, the time base must
be precisely known. Represent framerate as a fraction and add timing
info only if the old floating point framerate was not used.

Deprecate cfg->framerate so it can be removed once we get patches to
FFmpeg and libav.

Add support for (num)/(denom) format to --input-fps.
2016-01-14 22:16:53 +02:00
Ari Lemmetti a9bd7b9e63 Bump version numbers for release v0.8.0 2016-01-14 20:38:28 +02:00
Ari Lemmetti b605e3866e Bye bye Makefile 2016-01-14 20:38:01 +02:00
Marko Viitanen 242edf98ad Added calculation and writing of VUI num_units_in_tick and time_scale 2016-01-14 15:32:33 +02:00
Ari Lemmetti daf39e348f Add dedicated handling for blitting NxN coeffs when N is 4, 8 or 16 2016-01-13 19:27:45 +02:00
Ari Lemmetti a2fc9920e6 Merge branch 'alternative-satd' 2016-01-13 15:00:43 +02:00
Ari Lemmetti 1ed34f2df8 Add some planar pred optimization for blocks larger than 8x8 2016-01-13 14:50:17 +02:00
Ari Lemmetti 0df88697ff Copy generic function to AVX2 strategy 2016-01-12 23:51:18 +02:00
Ari Lemmetti 62799a9fc3 Create generic strategy of planar prediction 2016-01-12 23:50:47 +02:00
Ari Lemmetti 3cb1cebfe5 Add missing inlines 2016-01-12 23:03:31 +02:00
Ari Lemmetti 6a0b13b8b6 Remove unused functions 2016-01-12 22:55:37 +02:00
Ari Lemmetti 61155f0edd Add 128-bit version of the functions as well 2016-01-12 22:52:00 +02:00
Ari Lemmetti a6afb8a8f4 Small refactoring 2016-01-12 22:29:33 +02:00
Ari Lemmetti a756f6133a Manually unroll vertical Hadamard transform 2016-01-12 21:45:02 +02:00
Ari Lemmetti 66350aa20e Experiment with alternative implementation of FWHT 2016-01-11 16:25:56 +02:00
Arttu Ylä-Outinen e14858f41a Fix build and tests.
- Remove non-existent file interface_main.c from library sources.
- Add file mv_cand_tests.c to test sources.
2015-12-21 16:03:55 +02:00
Arttu Ylä-Outinen 9abdee7cc3 Merge branch 'autotools' 2015-12-21 15:54:30 +02:00
Arttu Ylä-Outinen eb6fa3d980 Fix exporting functions in library.
Rewrites definition of macro KVZ_PUBLIC in kvazaar.h so that
KVZ_STATIC_LIB need not be defined when building a static library.
2015-12-21 14:38:59 +02:00
darealshinji 8427a85d36 Add tests 2015-12-19 08:24:35 +01:00
Ari Koivula 1270da3626 Move files under their modules in Visual Studio
Also moves CLI stuff under CLI project, so they are compiled as their
own lib just like when the Makefile is used.

The file interface_main.c was an artifact from a bygone era and should have
been deleted long ago.
2015-12-17 15:39:45 +02:00
Ari Koivula 947bae24f9 Update Doxygen documentation
Add module information to all header files.

Update all header file documentations to briefly say what they are, and
to use the javadoc format so the brief actually gets included into the
doxygen documentation.

Remove \file from implementation files, in order to not repeat the info
from the header files.

Add files under strategies and tools to Doxygen and update the Doxygen
settings to be just plain better.

Make README be the main page of Doxygen documentation.
2015-12-17 14:05:50 +02:00
Ari Koivula a6ea705e19 Add missing lambda to some bit costs
Bits were being added to rate distortion without being multiplied by
lambda in a few places. Fixing this bug also finally allows us to remove
the magic bits from the Coding Unit split decision.

I tried to find new optimum value for CU_COST and it turned out to be 2
for veryslow and 0 for superfast. The difference between 0 and 2 on
veryslow was only 0.1% however, so I don't think this parameter is
needed any longer. Before this fix the effect of removing CU_COST would
have been 0.8%.
2015-12-15 16:32:38 +02:00
Arttu Ylä-Outinen 0e33049d9e Enable full mv search once again.
- Updates function search_mv_full so that it compiles and handles
  non-square blocks.
- Enables compilation of search_mv_full.
- Sets full search radius to 32.
- Enables selecting full mv search with "--me full".
2015-12-15 12:26:26 +02:00
Arttu Ylä-Outinen dbb9b0df85 Enable search for AMP blocks. 2015-12-15 11:21:46 +02:00
Arttu Ylä-Outinen 7e4f4538a4 Implement encoding AMP part modes.
Also adds parameter --amp for enabling AMP blocks.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen c3716f7803 Add --smp option for enabling SMP blocks. 2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 38b881c36f Implement search_frac for rectangular blocks.
Replaces parameter depth of function search_frac with parameters width
and height.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 864c77f6eb Use kvz_satd_any_size in inter search.
Changes search_frac and kvz_search_cu_iter to use kvz_satd_any_size for
computing the SATDs instead of getting the SATD function with
kvz_pixels_get_satd_func.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 056fa09ba5 Add arbitrary-sized SATD functions.
Adds strategy satd_any_size for generic and AVX2. The satd_any_size
functions are implemented with macro SATD_ANY_SIZE defined in
strategies-picture.h.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 6bdc08b6eb Drop unused function declaration.
Removes declaration of non-existent function satd_8bit_8x8_generic in
strategyselector.h.
2015-12-15 11:21:44 +02:00
Arttu Ylä-Outinen 728a6abecc Extract macro SATD_NxN.
Combines definitions of macros SATD_NXN and SATD_NXN_AVX2 to macro
SATD_NxN and moves it to strategies-picture.h.
2015-12-15 11:21:44 +02:00
Arttu Ylä-Outinen 1eebfde0c5 Make tz search work with non-square blocks.
Replaces parameter depth with parameters width and height.
2015-12-15 11:21:44 +02:00
Arttu Ylä-Outinen e203883f3d Refactor kvz_filter_deblock_lcu.
Moves code for filtering the rightmost 4 pixels of an LCU to a separate
function filter_deblock_lcu_rightmost.
2015-12-15 11:21:44 +02:00
Arttu Ylä-Outinen 21ca74fe86 Replace deblock filter with a simple loop.
- Adds function is_pu_boundary.
- Moves code for filtering an edge of a single PU or TU to a new
  function filter_deblock_unit.
- Replaces recursive CU tree traversal in filter_deblock_cu with
  a simple loop and renames it to filter_deblock_lcu_inside.
2015-12-15 11:21:43 +02:00
Arttu Ylä-Outinen 7516fda970 Make fractional recon work with non-square blocks.
Adds parameter block_height to functions inter_recon_frac_luma,
inter_recon_14bit_frac_luma and inter_recon_14bit_frac_chroma so that
they can handle SMP blocks.
2015-12-15 11:21:43 +02:00
Arttu Ylä-Outinen 591a1ce6db Turn some inter recon functions static.
Makes the following functions static since they are not used outside
inter.c:
- kvz_inter_recon_frac_luma
- kvz_inter_recon_14bit_frac_luma
- kvz_inter_recon_frac_chroma
- kvz_inter_recon_14bit_frac_chroma
2015-12-15 11:21:43 +02:00
Arttu Ylä-Outinen 0f531362bf Enable Nx2N partitions. 2015-12-15 11:21:43 +02:00
Arttu Ylä-Outinen 4402e251ae Fix kvz_get_extended_block functions.
The buffers allocated in functions kvz_get_extended_block_avx2 and
kvz_get_extended_block_generic were too small when the width of the
block was less than its height. Fixed to allocate correctly sized
buffers.
2015-12-15 11:21:43 +02:00
Arttu Ylä-Outinen bdd8b1c0aa Implement 2NxN partitions in inter search.
- Try using 2NxN partitions after the usual 2Nx2N.
- Adds function kvz_search_cu_smp to search_inter module.
2015-12-15 11:21:42 +02:00
Arttu Ylä-Outinen 410064e880 Split lcu_set_inter into two functions.
Moves code for setting the inter modes for a single PU to a new function
lcu_set_inter_pu.
2015-12-15 11:21:42 +02:00
Arttu Ylä-Outinen 3236428e4d Make hexbs search work with non-square blocks.
Replaces parameter depth with parameters width and height.
2015-12-15 11:21:42 +02:00
Arttu Ylä-Outinen 31ba8d61c3 Implement fractional chroma recon for SMP blocks.
Adds parameter block_height to function kvz_inter_recon_frac_chroma.
2015-12-15 11:21:42 +02:00
Arttu Ylä-Outinen 0b6cef7be5 Remove unused function kvz_inter_set_block. 2015-12-15 11:21:42 +02:00
Arttu Ylä-Outinen e63486b23f Make lcu_set_inter work with SMP blocks. 2015-12-15 11:21:41 +02:00
Arttu Ylä-Outinen 7b99eb2970 Call recon functions correctly for SMP blocks.
Makes calls to kvz_inter_recon_lcu and kvz_inter_recon_lcu_bipred in
function search_cu work correctly when using SMP blocks.
2015-12-15 11:21:41 +02:00
Arttu Ylä-Outinen dc4525c0e3 Implement inter recon for non-square blocks.
Adds parameter height to functions kvz_inter_recon_lcu and
kvz_inter_recon_lcu_bipred and makes them work on non-square sizes.
Fractional reconstruction functions do not handle non-square blocks yet.
2015-12-15 11:21:41 +02:00
Arttu Ylä-Outinen f874c8614e Add part_mode binarization table comment. 2015-12-15 11:21:41 +02:00
Arttu Ylä-Outinen c77074a7ff Implement encoding SMP blocks. 2015-12-15 11:21:41 +02:00
Arttu Ylä-Outinen 98707a1288 Move encoding intra CU to a separate function.
Moves code for encoding a single intra coding unit from function
kvz_encode_coding_tree to a new function encode_intra_coding_unit.
2015-12-15 11:21:40 +02:00
Arttu Ylä-Outinen c336674da3 Move encoding part mode to a separate function.
Moves code for encoding the part mode from function
kvz_encode_coding_tree to a new function encode_part_mode.
2015-12-15 11:21:40 +02:00
Arttu Ylä-Outinen ac952cbb44 Move encoding inter PUs to a separate function.
Moves code for encoding a single inter prediction unit from function
kvz_encode_coding_tree to function encode_inter_prediction_unit.
2015-12-15 11:21:40 +02:00
Arttu Ylä-Outinen 5ee9f164e8 Add macros for getting PU location and size.
- Moves SIZE_* definitions to cu.h.
- Adds constant arrays kvz_part_mode_num_parts, kvz_part_mode_offsets
  and kvz_part_mode_sizes for storing the number of PUs, PU offsets and
  PU sizes.
- Adds macros PU_GET_X, PU_GET_Y, PU_GET_W and PU_GET_H for getting the
  location and size of a PU.
2015-12-15 11:21:40 +02:00
Arttu Ylä-Outinen a3df13fb99 Make kvz_inter_get_merge_cand work with SMP blocks.
- Replaces parameter depth with parameters width and height.
- Adds parameters use_a1 and use_b1 for disabling the use of merge
  candidates A1 and B1.
2015-12-15 11:21:40 +02:00
Arttu Ylä-Outinen 1cd149fb97 Check merge/mv candidate types earlier.
Moves checks for motion vector prediction and merge candidate block
types (inter/intra) from functions kvz_inter_get_mv_cand and
kvz_inter_get_merge_cand to kvz_inter_get_spatial_merge_candidates.
2015-12-15 11:21:39 +02:00
Arttu Ylä-Outinen 969c91d7c4 Add a test for kvz_inter_get_spatial_merge_candidates. 2015-12-15 11:21:39 +02:00
Arttu Ylä-Outinen 02375bf7e5 Make kvz_inter_get_mv_cand work with SMP blocks.
Replaces the depth parameter of kvz_inter_get_mv_cand with parameters
width and height.
2015-12-15 11:21:39 +02:00
Ari Koivula 3a80c7de74 Further optimize coefficient coding
Remove the need to count the coefficients by populating the significant
coefficient group map first and finding the last coefficient from the
last group afterward. The speedup is about 2% on ultrafast.

The previous version of this patch was reverted due to a bug, which
has now been fixed.
2015-12-11 16:47:55 +02:00
Ari Lemmetti b78460b02c Optimize another loop 2015-12-11 11:21:43 +02:00
Ari Koivula b32965925e Revert "Further optimize coefficient coding"
This reverts commit 25462124f8.

That commit broke the bitstream. If it's not good enough to push on Friday
night, it's probably not good enough on Monday morning either.
2015-12-07 15:12:04 +02:00
Ari Koivula 865c86fef2 Remove unused variable 2015-12-07 10:32:18 +02:00
Ari Koivula 91631a1c36 Merge branch 'coeff-optimization' 2015-12-07 10:25:46 +02:00
Ari Koivula 25462124f8 Further optimize coefficient coding
Remove the need to count the coefficients by populating the significant
coefficient group map first and finding the last coefficient from the
last group afterward.
2015-12-07 10:23:01 +02:00
Ari Koivula c94707e6e8 Fix bug with OWF+FME+deblocking
Increases the MV safety margin of OWF from 2 to 3 when deblocking
is used and 4 when both deblocking and FME are used.

Fractional pixel motion estimation can move the vector one more pixel
down causing checksum error. This fixes that error by increasing the
OWF safety margin and changes the interface, so that different margin
can be used when FME or deblocking are not in use.
2015-12-04 15:26:56 +02:00
darealshinji fe2ff12244 fix building with autotools 2015-12-03 22:41:24 +01:00
darealshinji b6d3510c2e pkg-config: move -lm to Libs.private 2015-12-03 22:39:27 +01:00
Ari Lemmetti 6fe223c4dc Nonzero calculation magic 2015-12-03 18:29:44 +02:00
Ari Lemmetti f2d8cd4d64 Merge branch 'intra-search-multi' 2015-12-03 17:25:52 +02:00
Ari Lemmetti c4e1552ef6 Replace original rough intra search 2015-12-03 17:13:11 +02:00
Ari Lemmetti ee8c2d0218 Add 4x4 dual SATD for AVX2 2015-12-03 17:13:11 +02:00
Ari Lemmetti 00736fa708 Generate larger than 8x8 dual satd functions with macro 2015-12-03 17:13:11 +02:00
Ari Lemmetti bd3e1922cd Add AVX2 8x8 dual hadamard transform 2015-12-03 17:13:11 +02:00
Ari Lemmetti d575b94357 Implement generic functions for dual sad / satd 2015-12-03 17:13:11 +02:00
Ari Lemmetti 183ee53f47 Add alternative version of rough intra search.
Calculate two costs simultaneously to exploit larger SIMD registers.
Implementation for dual functions missing currently.
2015-12-03 17:12:38 +02:00
darealshinji 8ff28ec974 Make dynamic linking easier 2015-12-01 14:34:08 +01:00
Arttu Ylä-Outinen 21e19067fe Extract inter search in a single ref frame.
Moves code for doing inter search in a single reference frame from
function kvz_search_cu_inter to a new function search_cu_inter_ref.
2015-11-18 11:16:27 +02:00
Arttu Ylä-Outinen f9f3d5929e Use macros for indexing cu_array in lcu_t.
Replaces accesses cu_array with macro calls and adds macros
LCU_GET_TOP_RIGHT_CU and CU_GET_CU.
2015-11-18 11:16:27 +02:00
Arttu Ylä-Outinen 8db8f3d523 Use macro SUB_SCU where possible.
Replaces expressions like (x & 0x3f) with SUB_SCU(x).
2015-11-18 11:16:26 +02:00
Arttu Ylä-Outinen 9532d79adb Add macros for indexing cu array in lcu_t.
- Adds macros LCU_GET_CU and LCU_GET_CU_AT_PX to cu.h.
- Replaces accesses to the cu array of lcu_t by calls to these macros.
2015-11-18 11:16:26 +02:00
Arttu Ylä-Outinen 39302b0328 Refactor inter_clear_cu_unused.
Replaces duplicated code with a for loop.
2015-11-18 11:16:26 +02:00
Arttu Ylä-Outinen 33208ac9fb Add a comment explaining the cu array in lcu_t. 2015-11-18 11:16:25 +02:00
Arttu Ylä-Outinen e0b02599a5 Refactor filter_deblock_edge_ functions.
Replaces repetitive calls to kvz_filter_deblock_luma and
kvz_filter_deblock_chroma with loops in functions
filter_deblock_edge_luma and filter_deblock_edge_chroma.
2015-11-18 11:12:32 +02:00
Arttu Ylä-Outinen 43fc6ac419 Mark deblock functions static.
Marks the following functions static and removes them from filter.h
since they are not used outside the filter module.
- kvz_filter_deblock_luma
- kvz_filter_deblock_chroma
- kvz_filter_deblock_edge_luma
- kvz_filter_deblock_edge_chroma
- kvz_filter_deblock_cu
2015-11-18 11:12:31 +02:00
Arttu Ylä-Outinen c93a190940 Refactor deblocking filter functions.
- Replace parameter depth in kvz_filter_deblock_edge_{luma,chroma} with
  length.
- Move checking whether an edge needs to be filtered from functions
  kvz_filter_deblock_edge_{luma,chroma} to functions
  kvz_filter_deblock_{cu,lcu}.
- Use pixel coordinates instead of 8-pixel block coordinates in
  kvz_filter_deblock_cu.
- Add comments.

These changes should make it easier to modify the deblocking filter to
handle SMP and AMP blocks.
2015-11-18 11:12:31 +02:00
Arttu Ylä-Outinen 863bd1c55d Replace EDGE_ macros with an enum in filter.
Replaces macros EDGE_HOR and EDGE_VER with enum edge_dir.
2015-11-18 11:12:30 +02:00
Ari Koivula b6e443f3ce Fix bug in lp-gop parsing
Unnecessary mod operation resulted in 0 as the reference delta.
2015-11-16 11:58:57 +02:00
Ari Koivula cfe834bb53 Merge branch 'lowdelay_GOP'
Conflicts:
	README.md
2015-11-14 00:05:13 +02:00
Ari Koivula 5ae97b46c6 Remove redundant LP-GOP structures
These structures can now be defined with the LP-GOP syntax.
The syntax for lb is g4d3r4t1 and for ultralow it is g8d3r1t1.
2015-11-14 00:01:29 +02:00
Ari Koivula d2de2aa6aa Add "lp-g8d4r2t2" style GOP selection
This is my own syntax that I've been using when testing this feature.
It allows for defining some simple type of hierarchical GOP structures.
2015-11-14 00:01:28 +02:00
Ari Koivula b10866cb1e Fix SPS ref pic counts for lowdelay GOP 2015-11-13 23:11:11 +02:00
Ari Koivula a6a713ac02 Use P-slices for lowdelay GOPs 2015-11-13 23:11:11 +02:00
Ari Koivula 0722f461c5 Fix compiling tests on mac
The mac version of KVZ_GET_TIME macro has many statements, which
prevented it being used inside a for loop statement. Added brackets
to all versions to prevent this issue arising in the future.

Fixes #115.
2015-11-13 22:57:29 +02:00
Ari Koivula 93637f4683 Move macros in threads.h to KVZ_ namespace 2015-11-13 22:46:32 +02:00
Arttu Ylä-Outinen 95fb2ed9ed Fix unexpected behavior of --deblock option.
Giving a single number as an argument to --deblock option would enable
deblocking and set both beta and tc to that value. This commit changes
a single number argument to be interpreted as a boolean specifying
whether to enable deblocking or not. As a result, "--deblock 0" can be
now used to disable deblocking.

This fixes deblocking being enabled in all presets.
2015-11-13 14:22:42 +02:00
Arttu Ylä-Outinen 60ad19d0c8 Fix --deblock option.
Fixes --deblock option so that it takes a "beta:tc" argument as
advertised in the README and command line help.
2015-11-13 14:22:42 +02:00
Arttu Ylä-Outinen 8960ce369e Reject extra command line arguments.
Changes the command line program to print an error and exit instead of
silently ignoring non-option arguments.
2015-11-13 13:57:00 +02:00
Arttu Ylä-Outinen e42f1351f9 Call config_parse through the api struct in cli.
Replaces a call to kvz_config_parse with api->config_parse.
2015-11-09 14:31:04 +02:00
Arttu Ylä-Outinen 87ca9e1856 Drop an unnecessary call to kvz_threadqueue_flush.
Removes threadqueue dependency from the command line program.
2015-11-09 14:31:04 +02:00
Arttu Ylä-Outinen b1abe65e83 Move kvz_get_padding to encmain. 2015-11-09 14:31:03 +02:00
Arttu Ylä-Outinen 0eb1e710e6 Move PSNR computation from videoframe to encmain.
Moves function kvz_videoframe_compute_psnr to encmain and renames it to
compute_psnr. Removes videoframe dependency from the command line
program.
2015-11-09 13:50:43 +02:00
Arttu Ylä-Outinen 940ada4c0d Mark AVX2 intra filter functions as static.
Marks functions filter_4x4_avx2, filter_16x16_avx2 and filter_NxN_avx2
static as they are not used outside strategies/avx2/intra-avx2.
2015-11-09 12:48:20 +02:00
darealshinji f51e3847b6 Fix cross-building on Linux 2015-11-06 21:53:44 +01:00
Marko Viitanen 94bec1b444 Cleanup of mv-rdo, removed unused functions 2015-11-05 14:40:06 +02:00
Marko Viitanen 0cb57961b0 Use dynamically selected get_mvd_cost function for MV candidate selection 2015-11-05 14:31:37 +02:00
Marko Viitanen bb4f50aded Added mv-rdo commandline parameter and use it in presets 2015-11-05 13:59:30 +02:00
Marko Viitanen 4e7e9eefbf Enable usage of MV RDO with a config parameter (in hexbs, tz, frac, bipred) 2015-11-05 12:24:03 +02:00
Marko Viitanen 9a535e1c56 Added missing kvz_ prefixes and fixed some warnings 2015-11-05 09:07:59 +02:00
Marko Viitanen 822c174377 Set cabac to only count bits 2015-11-05 09:07:59 +02:00
Marko Viitanen 1ed0d85020 Added a function for cabac mvd coding cost get_mvd_coding_cost_cabac()
Conflicts:
	src/rdo.c
2015-11-05 09:07:59 +02:00
Marko Viitanen 6a2658cc74 Added calc_mvd_cost_cabac() to calculate real bits used for motion vectors
Conflicts:
	src/rdo.h
	src/search_inter.c

Conflicts:
	src/rdo.c
2015-11-05 09:07:59 +02:00
Ari Lemmetti fbd0596114 Merge branch 'avx2-pixels-blit' 2015-11-04 11:06:10 +02:00
Ari Lemmetti 57ea7d223b Pass SIMD registers to functions as pointers to fix 32-bit compilation in visual studio 2015-11-04 10:51:26 +02:00
Ari Lemmetti a3855652e9 Add AVX2 version with separate handling of basic blocks and strideless copy. 2015-11-04 10:07:25 +02:00
Ari Lemmetti 0816fbea2c Create generic strategy of blit function 2015-11-04 10:07:25 +02:00
Ari Koivula 5c1ff57f9f Add corresponding option for every "--no-X" option
Needed in order to turn back options turned off by presets.
2015-11-04 00:12:26 +02:00
Ari Koivula 8d9e8aad73 Fix lambda calculation to match HM
The lambda was not being increased for non-key frames and was different
in other ways too. The new implementation matches HM.
2015-11-03 16:49:42 +02:00
Ari Koivula ba47b3cdb1 Make --preset accept numbers
Ultrafast corresponds to 0 and placebo to 9.
2015-11-03 15:46:23 +02:00
Ari Koivula 74ee2f3b27 Redefine presets and include them in README. 2015-11-03 15:26:34 +02:00
Marko Viitanen 27e743a507 Added a commandline option for using a preset
- Defined presets: ultrafast, superfast, veryfast, faster, fast, medium,
                    slow, slower, veryslow, placebo
2015-11-03 12:25:06 +02:00
Marko Viitanen 641c204277 Use lowdelay flag in GOP for not using input picture caching
- Reduced layers to 3 in LB
2015-11-02 12:36:41 +02:00
Marko Viitanen 9a99f7972f New GOP structure for ultralow delay 2015-11-02 11:33:16 +02:00
Marko Viitanen 388986399f Added a definition for low-delay B GOP structure 2015-11-02 10:53:06 +02:00
Marko Viitanen 821d5c478b Added missing parameter to kvz_strategy_register_picture_generic() 2015-11-02 08:55:54 +02:00
Ari Lemmetti 6dce1f1e33 Update versions for a new release 2015-10-30 17:31:55 +02:00
Ari Lemmetti d71f1b5bd0 Disable incompatible optimizations for 32-bit version 2015-10-24 15:32:27 +03:00
Ari Lemmetti df995d85e8 Utilize AVX2 for dequantization. 2015-10-23 20:17:08 +03:00
Ari Lemmetti cf347e33c4 Move dequant to strategies. Copy generic to AVX2 as well. 2015-10-23 19:53:50 +03:00
Ari Lemmetti 47082738aa ...and the same tricks for quantized reconstruction 2015-10-23 19:44:38 +03:00
Ari Lemmetti 7961ba80d8 Add functions for bigger block sizes to calculate more residual simultaneously and reduce memory accesses 2015-10-23 19:11:56 +03:00
Ari Lemmetti 15edd5060d Load and store multiple elements simultaneously. Use 128-bit wide zero
test. *wip*
2015-10-23 17:03:16 +03:00
Ari Lemmetti b37cca87c8 Copy generic to avx2 2015-10-23 17:03:15 +03:00
Ari Lemmetti cad2ea9d6e Move quantize_residual to quant strategies. 2015-10-23 17:03:15 +03:00
Ari Lemmetti c013e58f0c Merge branch 'avx2-faster-angular' 2015-10-23 16:54:35 +03:00
Ari Lemmetti 0c63041ba7 Add filtering functions for different block sizes. Simplify logic a bit to reduce branching. Sorry for the large commit! 2015-10-23 16:54:15 +03:00
Arttu Ylä-Outinen f7b6365db8 Merge pull request #109 from lu-zero/master
version: Bump
2015-10-23 12:26:01 +03:00
Luca Barbato 7ecd9c7284 version: Bump
d5f3778f72 provided a new interface
2015-10-23 10:02:28 +02:00
Arttu Ylä-Outinen 1cf55f066f Fix memory leak in encoder_headers.
The header data was not freed when data_out was NULL.
2015-10-23 09:55:08 +03:00
Arttu Ylä-Outinen a1272e98f8 Prevent disabling VPS from command line.
Disabling VPS when using the command line encoder would result in an
invalid bitstream.
2015-10-19 11:25:29 +03:00
Arttu Ylä-Outinen 024fedff57 Disable writing VPS when vps_period is negative.
Turns vps_period in struct encoder_control_t into a signed value.
Negative values are interpreted as "never send parameter sets."
2015-10-19 11:25:18 +03:00
Arttu Ylä-Outinen d5f3778f72 Add function encoder_headers to API.
This provides means for obtaining the VPS, SPS and PPS separately from
the rest of the bitstream.
2015-10-16 11:47:27 +03:00
Arttu Ylä-Outinen 037b72c72b Add parameter stream to VPS, SPS and PPS encoding. 2015-10-14 14:40:45 +03:00
Arttu Ylä-Outinen db17d33b0b Simplify code in encoder_state-bitstream. 2015-10-14 12:37:26 +03:00
Luca Barbato 15fd8241a9 build: Replace a sed expression with a simpler awk
The former does not work for sure on macosx.
2015-10-10 12:42:24 +02:00
Luca Barbato a44d24ce40 build: Drop a trailing space 2015-10-10 12:42:10 +02:00
Ari Lemmetti 5af7a42ebe Enable AVX2 strategy. Add first version of optimizations. 2015-10-08 12:36:20 +03:00
Ari Lemmetti f4fe3dca5e Add AVX2 strategy. Copy generic implementation there. 2015-10-08 12:36:15 +03:00
Ari Lemmetti 54e8b346a3 Add intra strategy. Move angular prediction there. 2015-10-08 12:36:05 +03:00
Ari Lemmetti c123b97fec Remove option -fno-lto from strategies. LTO is no longer used anyway. 2015-10-05 19:34:56 +03:00
Ari Koivula ff976e2afc Arrange parameters in intra fancily 2015-10-05 06:23:14 +03:00
Ari Koivula d83d57df1a Fix function names in intra
Prefix non-static functions with kvz_intra_ and static with intra_.
2015-10-05 06:23:14 +03:00
Ari Koivula 30b4fa4247 Rename intra prediction to kvz_intra_predict 2015-10-05 06:23:14 +03:00
Ari Koivula 7280dbf429 Remove unnecessary function
This function used to be more complicated, but now it's so simple that
it's just obfuscating what's happening.
2015-10-05 06:23:05 +03:00
Ari Koivula 1221e4c7d2 Remove old intra prediction code. 2015-10-05 05:30:47 +03:00
Ari Koivula 23439557e6 Remove remaining usages of old intra prediction 2015-10-05 05:23:22 +03:00
Ari Koivula ca3ba997aa Switch to new intra pred in search_intra_chroma_rough. 2015-10-05 05:03:58 +03:00
Ari Koivula eaff6e29d9 Switch to new intra pred in kvz_search_cu_intra 2015-10-05 04:00:42 +03:00
Ari Koivula 55d741e250 Switch to new intra pred in kvz_intra_recon_lcu_chroma 2015-10-05 04:00:42 +03:00
Ari Koivula 678a1dd1dd Switch to new intra pred in kvz_intra_recon_lcu_luma 2015-10-05 02:29:02 +03:00
Ari Koivula cd2f1797bf Add reimplemented intra prediction code
Just along side for now to help with debugging.

The main difference with the new versions is that they take and output
width**2 blocks and two width*2+1 arrays of reference samples,
instead of the (2*width+8)**2 blocks the old ones do. This should make
the interface clearer and the memory footprint smaller.

Also commented the shit out of angular prediction, so hopefully Ari L.
will have an easier time with a SIMD implementation.
2015-10-05 02:29:02 +03:00
Ari Koivula 115756b9d7 Accept --rd=3 parameter 2015-10-05 02:28:56 +03:00
Ari Lemmetti 7a3dabf43e Merge branch 'avx2-quant' 2015-10-02 16:31:33 +03:00
Ari Lemmetti 38106afa50 Add AVX2 version of quantization. 2015-10-02 16:18:52 +03:00
Ari Lemmetti ef0ad292ef Add quantization strategy. 2015-10-02 16:17:02 +03:00
Ari Koivula 41dd44f7cf Fix warnings with -DNDEBUG 2015-10-02 15:13:07 +03:00
Ari Koivula 81f5ca76cb Accept tile configurations with either dimension as one 2015-10-02 15:06:31 +03:00
Ari Lemmetti 989cee1b04 Add 4x4 function as well 2015-10-01 22:14:56 +03:00
Ari Lemmetti 8b57b2bb1a Refactor SATD to inline most of the function. Replace full horizontal add with shuffle and regular packed add. 2015-10-01 21:29:25 +03:00
Ari Lemmetti 55da2a9958 Add intrinsic version of SATD for 8x8 and larger blocks 2015-10-01 19:42:22 +03:00
Ari Lemmetti d68fc4c41e Add header for common utilities to use with strategies. 2015-10-01 19:40:35 +03:00
Arttu Ylä-Outinen 512e5bb25f Bump version to 0.7.0 2015-09-30 15:20:57 +03:00
Arttu Ylä-Outinen 8f404a3b6f Add NAL unit type to frame_info. 2015-09-28 10:30:59 +03:00
Arttu Ylä-Outinen 1c898a2f4a Prefix NAL unit type enum constants with KVZ_. 2015-09-28 10:30:58 +03:00
Arttu Ylä-Outinen 4e5c7fe6e8 Remove function kvz_encoder_compute_stats.
Changes main function to compute frame PSNR by calling
kvz_videoframe_compute_psnr directly with the source and reconstructed
pictures returned from encoder_encode.
2015-09-28 10:30:58 +03:00
Arttu Ylä-Outinen efd361ee8e Return the original picture from encoder_encode. 2015-09-28 10:30:58 +03:00
Arttu Ylä-Outinen afd0d3eee0 Remove encoderstate dependency from cli module.
Changes function print_frame_info to use a kvz_frame_info struct to get
the data to be printed.
2015-09-28 10:30:58 +03:00
Arttu Ylä-Outinen 7edc1b0b1c Add reference picture lists to kvz_frame_info. 2015-09-28 10:30:57 +03:00
Arttu Ylä-Outinen d5dceb45f1 Factor out a function for building ref lists.
The code for building the reference picture lists was duplicated in
functions encoder_state_ref_sort and print_frame_info. This commit moves
it to a new function kvz_encoder_get_ref_lists. Also makes
encoder_ref_insertion_sort static since it is not used outside the
encoderstate module any more.
2015-09-28 10:30:57 +03:00
Arttu Ylä-Outinen c856a6b598 Output frame info from encoder_encode.
Adds a new output parameter info_out to encoder_encode. It returns
a struct containing information about the encoded frame, including POC,
QP and slice type.
2015-09-28 10:30:57 +03:00
Arttu Ylä-Outinen 173b70b53f Rename SLICE_* enum constants to KVZ_SLICE_*. 2015-09-28 10:30:56 +03:00
Ari Koivula 63ab4068be Clean up the makefile a bit
Use the existing TARGET_CPU_ARCH and TARGET_CPU_BITS instead of filtering
ARCH over and over again.

Comment some of the more obscure parts.
2015-09-18 15:13:13 +03:00
Ari Koivula eb12fe0d98 Re-enable disabled -m32 and -m64 flags. 2015-09-18 12:13:31 +03:00
Ari Koivula 9537b996e7 Make makefile work on arm
Only compile x86 specific optimizations for x86 and don't give
-m32 or -m64 on arm.
2015-09-18 00:23:49 +03:00
Ari Koivula d76890bbff Bump version to 0.6.1 2015-09-16 18:42:20 +03:00
Ari Koivula 1d5cfbdcc2 Remove unused variable. 2015-09-16 18:39:46 +03:00
Ari Koivula 513e80bcca Fix bug causing unnecessary copying of memory
This bug caused a single tiles worth of lcu_info_t structs to be copied
unnecessarily for every LCU in the frame. This obviously caused huge
memory bandwidth issues when coding large frames without tiles. The
effect was minimized somewhat with a large number of tiles, because
only the current tile was copied.

From context it is clear that this piece of code was supposed to copy
a single tile or frame, once the frame was done, but because it was
placed in a function which is called for every LCU, it copied the data
for the LCU, but also lots of extra stuff.

The fix is to copy only the current LCU instead of the whole tile.
2015-09-16 18:23:44 +03:00
Marko Viitanen d8b50d6951 Bump version to 0.6.0 2015-09-15 15:44:55 +03:00
Marko Viitanen 5b3f2a6229 Merge branch 'pkgconfig-fix' 2015-09-15 15:37:49 +03:00
Ari Koivula f1ac0e6bc2 Rename _DEBUG to KVZ_DEBUG 2015-09-15 13:04:03 +03:00
Ari Koivula ec2d8d6ad7 Rename _DEBUG_PERF macros to KVZ_PERF
And move them to threadqueue.h, where the things that use them are.
2015-09-15 13:03:32 +03:00
Arttu Ylä-Outinen 4db06bcf07 Use correct version in kvazaar.pc.
Changes kvazaar.pc to use kvazaar version instead of the library
version. The version number is extracted from global.h using sed.
2015-09-15 12:57:34 +03:00
Marko Viitanen 3217e70f99 Revert "Revert "Fix keeping of reference frames over IDR boundary.""
This reverts commit 87936eb99f.

Conflicts:
	src/encoderstate.c
2015-09-14 14:31:58 +03:00
Arttu Ylä-Outinen b4ec664fc9 Set DTS values of output pictures.
Adds field dts to struct kvz_picture and rewrites kvz_encoder_feed_frame
to set the DTS when returning pictures.
2015-09-14 14:16:56 +03:00
Arttu Ylä-Outinen 25c23aa298 Remove static variables from kvz_encoder_feed_frame.
Adds struct input_frame_buffer_t for storing the input buffer state.
2015-09-14 14:12:19 +03:00
Arttu Ylä-Outinen 1d2a398197 Move function kvz_encoder_feed_frame to a separate module.
Adds module input_frame_buffer.
2015-09-14 14:12:18 +03:00
Arttu Ylä-Outinen 009717bf7c Remove unused field bitstream_length from kvz_encoder. 2015-09-14 14:12:18 +03:00
Arttu Ylä-Outinen 97913cee40 Add pts field to kvz_picture.
The pts field can be used to set the presentation timestamp of the input
frames. The timestamps are copied to the reconstructed frames.
2015-09-14 14:12:00 +03:00
Ari Koivula 24618c90ce Fix wrong type in debug code.
- This type is expected by outside debug scripts. It does not have to
  match the function name.
2015-09-10 16:07:18 +03:00
Ari Koivula 0ac2bc31a3 Put parenthesis around _DEBUG.
- To protect against precedence issues.
2015-09-10 16:06:19 +03:00
Ari Koivula 3958e8b6f7 Handle VS warnings with _DEBUG.
- Conditional expression is constant was being triggered by debug code.
2015-09-10 14:16:42 +03:00
Ari Koivula cb1a206c74 Dump threading data structures only with _DEBUG_PRINT_THREADING_INFO.
- They are usually not needed when using _DEBUG.
2015-09-10 14:16:42 +03:00
Arttu Ylä-Outinen 70b3e10e27 Fix a crash with owf=4, gop=8, frames=10.
A call to kvz_threadqueue_waitfor caused the tqj_bitstream_written field
of the previous encoder state to become a dangling pointer, subsequently
causing an assertion to fail. This would only occur when the encoder
state used for a new frame was not the last finished one.

Fixed by setting tqj_bitstream_written to NULL after the job is done and
removing unnecessary calls to kvz_threadqueue_waitfor.
2015-09-07 15:37:04 +03:00
Arttu Ylä-Outinen 3c35f470a1 Fix get_ctx_cu_split_model. 2015-09-02 11:47:03 +03:00
Ari Koivula 9a23ae3d92 Resolve remaining Visual Studio warnings.
- Ignore most of them and fix the ones that can't be ignored.
2015-08-31 15:02:25 +03:00
Luca Barbato efe5291427 build: Drop the gnu-only option Deterministic
Unbreak building on MacOSX and possibly other BSDs.
2015-08-29 10:38:10 +02:00
Ari Koivula c52f7858ab Use long start code in picture_timing_sei if it's first NAL in AU. 2015-08-27 15:23:34 +03:00
Ari Koivula 69d1059602 Fix access unit delimiter.
- The nal header was written after the pic_type.
2015-08-27 15:18:25 +03:00
Ari Koivula 9584cd7352 Move rbsp_trailing_bits elements to encapsulating functions.
- Also add missing bitstream align. It's unnecessary as the version can't not
  be byte aligned.
2015-08-27 15:18:18 +03:00
Ari Koivula 207367f317 Add new kvz_bitstream_align which only aligns when needed.
- Changing picture_timing_sei_message to align doesn't change anything, but
  protects against future changes if more data is added there in future.
2015-08-27 15:16:20 +03:00
Ari Koivula b2fb1b6d4a Rename kvz_bitstream_align to kvz_bitstream_rbsp_trailing_bits.
- The syntax is called rbsp_trailing_bits in spec and 1 byte is added
  even when the bitstream is already aligned, so align is a bad name.
2015-08-27 14:33:30 +03:00
Arttu Ylä-Outinen 3a10e9e3e0 Prefix all non-static symbols with "kvz_". 2015-08-26 13:02:28 +03:00
Arttu Ylä-Outinen bfe2b31cee Make generic satd functions static. 2015-08-26 12:10:27 +03:00
Arttu Ylä-Outinen d0bc58a874 Document the library API in more detail. 2015-08-26 12:10:27 +03:00
Arttu Ylä-Outinen 04ba5dca41 Make config_destroy accept a NULL pointer. 2015-08-26 12:10:26 +03:00
darealshinji 87a4e53c35 Makefile: add $(CPPFLAGS), use -Wl,-z,noexecstack and install development files on Linux 2015-08-22 16:00:26 +02:00
Ari Lemmetti 3661f3f3f5 Fix incorrect free on error 2015-08-21 18:26:12 +03:00
Ari Lemmetti 4103bd2786 Add missing padding for frame allocation 2015-08-21 17:25:54 +03:00
Ari Lemmetti d6c3363dc8 Merge branch 'interlacing_experimental' 2015-08-21 15:50:24 +03:00
Ari Lemmetti ed7948810e Fix help message and update README.md 2015-08-21 15:29:48 +03:00
Ari Lemmetti 68fcc67a16 Add extraction of fields according to source scan type 2015-08-21 15:15:20 +03:00
Ari Lemmetti 581ff95412 Write flags and SEI messages for interlacing. 2015-08-21 14:46:05 +03:00
Arttu Ylä-Outinen 38893d2da1 Use the static lib to link the program.
Changes the Makefile to use the static library and additional object
files as linker input instead of all the object files when linking the
command line program.
2015-08-20 16:42:29 +03:00
Arttu Ylä-Outinen cb49586d36 Add static library target to Makefile.
Adds targets libkvazaar.a and install-static to the Makefile.
2015-08-20 16:42:29 +03:00
Arttu Ylä-Outinen 23159e45b7 Rename LIB_OBJS to SHARED_OBJS in Makefile.
SHARED_OBJS is a more appropriate name since the objects are only used
for building shared libraries, not static ones.
2015-08-20 16:42:28 +03:00
Arttu Ylä-Outinen 8488ba1bb7 Remove unnecessary objects from libraries.
Removes the cli and yuv_io modules from libraries.
2015-08-20 16:42:28 +03:00
Arttu Ylä-Outinen dd874a0a4a Move writing of reconstructed picture to encmain.
- Removes parameter recout of function encoder_compute_stats.
- Now only encmain uses the yuv_io module.
2015-08-20 16:42:28 +03:00
Ari Lemmetti 923f4a74d5 Fix filtering over limits 2015-08-17 17:39:56 +03:00
Ari Lemmetti febe66f148 Fix typo 2015-08-17 16:16:11 +03:00
Ari Lemmetti 82cf4e8ff4 Output error messages to stderr 2015-08-17 15:01:46 +03:00
Ari Lemmetti 3da71b62bf Add checks if malloc fails 2015-08-17 15:01:46 +03:00
Ari Lemmetti 4718fe7fda Change variable names to match used convention 2015-08-17 15:01:46 +03:00
Ari Lemmetti 6a5eaf08de Rename extend_borders to get_extended_block. Add kvz_ prefix to type definition. 2015-08-17 15:01:46 +03:00
Ari Lemmetti d82582c37c Changes to extend border function.
Now outputs a pointer to a block with guaranteed padding for filtering.
Only generate extra pixels if samples are needed out of bounds.
Use memcpy otherwise.
2015-08-17 15:01:46 +03:00
Ari Lemmetti fc038cb8bf Add --source-scan-type parameter
Options progressive (default)
        tff for top field first
        bff for bottom field first
2015-08-13 12:53:14 +03:00
Ari Lemmetti 4dcc0d876d Fix rate control when --owf=auto (default)
Value in cfg stays -1 if auto selection is used.
Use value in encoder instead.
2015-08-12 18:20:57 +03:00
Ari Lemmetti 5d96dbc6c0 Make strategy selection use bit depth given via parameter instead of excluding registration with defines 2015-08-12 13:33:38 +03:00
Ari Lemmetti 4122f36089 Prevent the registration of strategies that are incompatible when KVZ_BIT_DEPTH != 8
Remove unnecessary or misleading mentions of "8bit"
2015-08-12 11:29:53 +03:00
Ari Lemmetti 33b6481660 Remove unused variables. 2015-08-11 15:53:40 +03:00
Ari Lemmetti 348d7780fc Remove third shift and offset from 14-bit sampling functions (change missing from rebase) 2015-08-11 15:06:16 +03:00
Marko Viitanen 8409317bd9 Fixed rebasing errors for 10bit branch 2015-08-11 14:56:45 +03:00
Marko Viitanen 58f12bd530 Changed frame 8bit to 10bit conversion to be done without memory allocation 2015-08-11 08:18:14 +03:00
Marko Viitanen 6453a511d7 Scale SAD/SATD costs to match bit depth
Conflicts:
	src/image.c
2015-08-11 08:18:14 +03:00
Marko Viitanen 0304b6c412 Fixed luma interpolation filter when 10bit coding and some other minor fixes 2015-08-11 08:17:48 +03:00
Marko Viitanen 450b5e64ca Fixed overflow on generic ipol filters when 10bit encoding
Conflicts:
	src/strategies/generic/ipol-generic.c
2015-08-11 08:17:48 +03:00
Marko Viitanen 191d3e4d87 Fixed RDOQ on 10bit encoding 2015-08-11 08:14:35 +03:00
Marko Viitanen 414ebe6101 Fixed checksum on bitdepth > 8 cases
Conflicts:
	src/nal.c
	src/nal.h
	src/strategies/generic/nal-generic.c
	src/strategies/strategies-nal.c
	src/strategies/strategies-nal.h
2015-08-11 08:14:35 +03:00
Marko Viitanen 57ab46f110 Small fixes all around to enable 10bit encoding
Conflicts:
	src/encmain.c
	src/encoder.c
	src/encoderstate.c
	src/global.h
2015-08-11 07:59:20 +03:00
Ari Lemmetti 7cd4f7a5c9 Enable fractional motion vectors with bipred 2015-08-10 18:49:12 +03:00
Ari Lemmetti 5887c96991 Add and use 14bit reconstruction for fractional motion vectors with bipred 2015-08-10 18:45:29 +03:00
Ari Lemmetti a87dafb27a Add hi_prec_buf_t for higher precision intermediate values for interpolation filter. 2015-08-10 18:35:06 +03:00
Ari Lemmetti 3e31ff2476 Use the function to sample half-pixels. 2015-08-10 18:30:41 +03:00
Ari Lemmetti 0a096c7040 Move qpel and octpel reconstruction in separate functions. 2015-08-10 18:25:52 +03:00
Ari Lemmetti fc00b4795c Preparations for more accurate reconstruction with bipred 2015-08-10 18:08:13 +03:00
Ari Lemmetti 8b4a6c92da Add 14bit precision sample functions. 2015-08-10 18:02:06 +03:00
Ari Lemmetti b30f17d4b8 Add fractional pixel sampling for chroma 2015-08-10 17:55:37 +03:00
Ari Lemmetti 650dd7d840 Use pixels_blit to copy neccessary pixels. 2015-08-10 17:52:00 +03:00
Ari Lemmetti 01f40ec104 Add fractional pixel sampling for luma 2015-08-10 17:51:48 +03:00
Ari Koivula 0c3c93d456 Optimize intra SAD intrinsics.
- Added 64x64 version for completeness.
- With the exception of 16x16, these were all slightly slower than the ASM
  versions, as measured by "kvazaar_test -s speed -t intra_sad", but now they
  are on par or slightly faster.
- None of these actually use any AVX2 intrinsics, and probably never will,
  unless someone adds an interface for doing more than one block at a time,
  in which case the non-destructive versions might come in handy.
2015-08-06 19:35:00 +03:00
Ari Lemmetti 20b833bc8e Fix mingw errors 2015-07-31 18:44:36 +03:00
Ari Lemmetti 12c391eb08 Add auto-detection for input resolution.
Use --input-res=auto as default.
2015-07-31 17:35:16 +03:00
Ari Koivula 0740a73fbb Clean up Makefile.
- Move stuff around.
- LDFLAGS -shared and -dynamiclib imply -fpic.
2015-07-31 15:57:05 +03:00
Ari Koivula beec2705b1 Add cli, lib-shared and lib-static to Makefile. 2015-07-31 15:57:05 +03:00
Ari Koivula 24b3306325 Fix incorrect pattern rules in Makefile.
- Having more than one rule in a pattern rule means that both of those files
  are created at the same time with the rule. This only worked for debug,
  because debug build was never done in the same invocation as release build.
2015-07-31 14:36:45 +03:00
Ari Koivula 1c27f67963 Remove -flto.
- Always use the compiler to invoke the linker. Clang will give additional
  parameters to the linker when compiled with -flto.
- Giving a different optimization level to linker did not make any difference
  in gcc-5.1.1.
2015-07-31 14:36:26 +03:00
Ari Koivula 54b1be341e Don't compile executable with PIC.
- It's required for .so and .dylib, but not for .dll or the executable.
- It might be better to use libtool for this, but I'm not ready to go that
  far yet.
2015-07-29 17:12:09 +03:00
Ari Koivula f8154f8382 Merge branch 'make-dylib' 2015-07-29 11:28:43 +03:00