hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-27 19:24:06 +00:00

Author	SHA1	Message	Date
Ari Koivula	cbfa824d1a	Merge branch 'simd'	2016-09-27 20:49:45 +03:00
Ari Koivula	14a7bcba25	Use a faster function for clipped inter SAD Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes for which we don't have AVX versions yet. Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for: --preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp * Suite speed_tests: -PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) -PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)	2016-09-27 20:48:30 +03:00
Arttu Ylä-Outinen	4313e56c2d	Add --no-rdoq-skip command line switch	2016-09-11 17:40:16 +09:00
Ari Koivula	a7a33b08ec	Remove --slice-addresses from usage message And give a warning if it's used. Slices will have to be implemented at some point, but they aren't yet so let's not advertize them.	2016-09-10 21:06:00 +03:00
Eemeli Kallio	f41e428e5f	Removed kvz_skip_unnecessary_rdoq and reworked --rdoq-skip to skip 4x4 blocks when it is on.	2016-09-09 10:26:07 +03:00
Eemeli Kallio	ed9c0b0416	RDOQ reworked in rdo.c. rdoq_signhide now skips coeffs that are after best_last_idx.	2016-09-09 10:16:51 +03:00
Ari Koivula	02cd17b427	Add faster AVX inter SAD for 32x32 and 64x64 Add implementations for these functions that process the image line by line instead of using the 16x16 function to process block by block. The 32x32 is around 30% faster, and 64x64 is around 15% faster, on Haswell. PASS inter_sad: 28.744M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec) PASS inter_sad: 7.882M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec) to PASS inter_sad: 37.828M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec) PASS inter_sad: 9.081M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)	2016-09-01 21:36:39 +03:00
Ari Koivula	d0512d25c6	Use fixed point in get_mvd_coding_cost	2016-08-30 21:37:12 +03:00
Ari Koivula	ec7507a935	Further optimize get_ep_ex_golomb_bitcost Unrolled 16-bit log2 calculation.	2016-08-30 21:37:01 +03:00
Ari Koivula	a4ba794587	Optimize get_ep_ex_golomb_bitcost Arrange the decision tree such that there is only 3 branches on the most common paths and the more likely branch is always fall-through. A profile guided optimization pass would probably do something similar.	2016-08-30 05:24:16 +03:00
Ari Koivula	82cfab58f8	Improve fast mvd coding cost estimation A lot of time is being taken up by this function on ultrafast, and it doesn't do a very good job. This change aims to both simplify the logic and make the estimate better. The logic is simplified by using a look up for the step mvd bit cost step function instead of mimicking the binarization process. The estimation is made better by checking fractional cabac bit costs. The new function returns the same results as kvz_get_mvd_coding_cost_cabac, but is also faster than the old function.	2016-08-30 04:55:09 +03:00
Ari Koivula	d31be8eb27	Make mvd_coding_cost functions take const cabac	2016-08-30 04:46:46 +03:00
Ari Koivula	64d631c174	Fix 8bit to 10bit input conversion regression	2016-08-25 22:09:40 +03:00
Ari Koivula	27789125d8	Fix input bit depth conversion The input was being shifted to the wrong direction.	2016-08-25 22:05:25 +03:00
Ari Koivula	4ec039004b	Add monochrome encoding Write bitstream without chroma when encoding with --input-format=P400. This reduces bitstream size by 0-1 %, compared to coding monochrome in 420 format, and speeds up encoding slightly due to not processing chroma.	2016-08-25 20:15:26 +03:00
Ari Koivula	c5b70cf812	Add chroma format support to yuv_t	2016-08-24 19:20:53 +03:00
Ari Koivula	032ed30ff4	Add chroma format support to kvz_picture Add picture_alloc_csp to libkvz api to allocated pictures with chroma format different from 420.	2016-08-24 19:20:53 +03:00
Ari Koivula	48ccc26839	Add --input-format and --input-bitdepth Adds reading of 10 bit input for 10-bit encoding.	2016-08-24 19:20:53 +03:00
Ari Koivula	cc08073615	Refactor some indexing weirdness in init_lcu_t I thought there might be a bug in this so I cleaned it up.	2016-08-24 19:12:48 +03:00
Ari Koivula	b6d674d66e	Refactor integer vector inter prediction This code was pretty bad, so I cleaned it up a bit.	2016-08-24 19:09:26 +03:00
Ari Lemmetti	28c4174d0e	Fix incorrect shuffle parameters _MM_SHUFFLE uses reverse order	2016-08-23 19:40:46 +03:00
Ari Lemmetti	ce77bfa15b	Replace KVZ_PERMUTE with _MM_SHUFFLE The same exact macro already exists	2016-08-22 19:08:46 +03:00
Jovasa	68eef660bd	Fixed search around mv_in in fullsearch not being saved.	2016-08-19 15:19:29 +03:00
Eemeli Kallio	99d8b9abeb	Changed skip_rdoq name to kvz_skip_unnecessary_rdoq. Changed the order it uses when it goes through CGs and tuned its sum calculation.	2016-08-18 14:02:56 +03:00
Eemeli Kallio	1fb4755f31	Added rdoq-skip to quant-generic.c	2016-08-18 12:17:54 +03:00
Eemeli Kallio	d20ac03ca2	Added --rdoq-skip option	2016-08-18 12:17:53 +03:00
Marko Viitanen	83cf801664	Fixed MV constraint condition in bipred	2016-08-18 08:53:17 +03:00
Marko Viitanen	5ae1c595f2	Fixed slice_temporal_mvp_enabled_flag and disabled TMVP with tiles - slice_temporal_mvp_enabled_flag should be signalled also with non-IDR I-slices	2016-08-10 14:51:41 +03:00
Marko Viitanen	5326519182	TMVP cleanup and const qualifier fixes	2016-08-10 14:10:43 +03:00
Marko Viitanen	f40907260d	Added config parameter for TMVP and cmdline option --no-tmvp - Enabled by default - Cannot be used with GOP at the moment	2016-08-10 14:09:29 +03:00
Marko Viitanen	fd52dac1f7	Fixed TMVP scaling	2016-08-10 14:09:28 +03:00
Marko Viitanen	c664bc8cf7	Added flag collocated_ref_idx to the slice header	2016-08-10 14:09:28 +03:00
Marko Viitanen	c5f2611a38	Fixes for TMVP to work with the new CU array	2016-08-10 14:09:28 +03:00
Marko Viitanen	d85af5755b	TMVP working when only 1 ref frame	2016-08-10 14:09:28 +03:00
Marko Viitanen	39f0165efe	Fix a bug in TMVP, the reference cu_array was being overwritten	2016-08-10 14:09:27 +03:00
Marko Viitanen	adab8c327e	Clean TMVP code	2016-08-10 14:09:20 +03:00
Marko Viitanen	5fa8226ac9	Temporal merge candidate selection	2016-08-10 14:09:20 +03:00
Marko Viitanen	f83042f4a1	Temporal MV candidate selection	2016-08-10 14:09:19 +03:00
Marko Viitanen	f8671581e3	Implemented function kvz_inter_get_temporal_merge_candidates()	2016-08-10 14:09:19 +03:00
Marko Viitanen	2956bdb379	Added flag slice_temporal_mvp_enabled_flag	2016-08-10 14:09:19 +03:00
Arttu Ylä-Outinen	2a946bd88e	Rename encoder_state_t.global to frame "Frame" is more accurate than "global" since when OWF is used, encoder states for each frame have their own struct.	2016-08-10 13:22:36 +09:00
Arttu Ylä-Outinen	5fbb0a8c27	Fix includes	2016-08-10 13:05:40 +09:00
Arttu Ylä-Outinen	aabf6ca3ee	Extract encoding code from encoderstate.c Moves functions kvz_encode_coding_tree and kvz_encode_coeff_nxn from encoderstate.c to encode_coding_tree.c.	2016-08-09 22:16:50 +09:00
Arttu Ylä-Outinen	803f29be8f	Remove reconstructed picture allocation in lossless. Changes encoder_set_source_picture to set the reconstructed picture to a copy of the source picture instead of allocating a new picture when lossless coding is used.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	aaec473a19	Refactor encoder state initialization. - Moves allocation of the reconstructed picture after the source picture is set. - Extracts main state initialization to a separate function from encoder_state_new_frame. - Changes kvz_encoder_feed_frame to return the frame. - Renames some functions to better match their purpose.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	cd7024b3a5	Skip computing SSD when using lossless coding. The SSD is always zero since it is lossless.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	fbbe5d1844	Use kvz_pixels_calc_ssd for SSD in search.c. Replaces loops for computing SSDs by calling kvz_pixels_calc_ssd in search.c.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	22cc97ffb1	Fix missing field initializers.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	06b82bf888	Disable filters, trskip and signhide in lossless. When lossless coding is used, deblock and SAO are skipped, transform skip flag is not written and sign hiding is not used.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	97451ec401	Align assignments in encoder.c.	2016-08-03 14:25:08 +09:00

1 2 3 4 5 ...

1944 commits