hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 10:34:05 +00:00

Author	SHA1	Message	Date
Pauli Oikkonen	84cf771dea	Unroll 32 and 16 wide SAD vector implementations by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	5df5c5f8a4	Cast all pointers to const types in vector SAD funcs Also tidy up the pointer arithmetic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a711ce3df5	Inline fixed width vectorized SAD functions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	6504145cce	Remove 16-pixel wide AVX2 SAD implementation At least on Skylake, it's noticeably slower than the very simple version using SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4cb371184b	Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	796568d9cc	Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4d45d828fa	Use constant-width SSE4.1 SAD funcs for AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	2eaa7bc9d2	Move SSE4.1 SAD functions to separate header	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	d2db0086e1	Create constant width SAD versions for 8 and 16 pixels	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a13fc51003	Include a blank AVX2 strategy registration function even in non-AVX2 builds	2019-02-04 19:52:24 +02:00
Pauli Oikkonen	d55414db66	Only build AVX2 coeff encoding when supported ..whoops	2019-02-04 19:34:30 +02:00
Pauli Oikkonen	3fe2f29456	Merge branch 'encode-coeffs-avx2'	2019-02-04 18:52:31 +02:00
Pauli Oikkonen	722b738888	Fix more naming issues	2019-02-04 16:05:43 +02:00
Pauli Oikkonen	e26d98fb75	Rename a couple variables and add crucial comments	2019-02-04 15:57:07 +02:00
Pauli Oikkonen	f186455619	Move encode_last_significant_xy out of strategy modules It's the exact same in both AVX2 and generic, and does not seem to be worth even trying to vectorize	2019-02-04 14:55:41 +02:00
Pauli Oikkonen	3f7340c932	Fine-tune pack_16x16b_to_16x2b Avoid mm_set1 operation when it's possible to create the constant with one bit-shift operation from another instead. Thanks Intel for 3-operand instruction encoding!	2019-02-04 14:44:47 +02:00
Pauli Oikkonen	314f5b0e1f	Rename 16x2b cmpgt function, comment it better, optimize it slightly Eliminate an unnecessary bit masking to make it even more messy	2019-02-04 14:44:32 +02:00
Pauli Oikkonen	d8ff6a6459	Fix _andn_u32 to work on old Visual Studio	2019-02-01 15:34:42 +02:00
Pauli Oikkonen	26e1b2c783	Use (u)int32_t instead of (unsigned) int in reg_sad_sse41	2019-01-10 14:37:04 +02:00
Pauli Oikkonen	3a1f2eb752	Prefer SSE4.1 implementation of SAD over AVX2 It seems that the 128-bit wide version consistently outperforms the 256-bit one	2019-01-10 13:48:55 +02:00
Pauli Oikkonen	9b24d81c6a	Use SSE instead of AVX for small widths Highly dubious if this will help performance at all	2019-01-07 20:12:13 +02:00
Pauli Oikkonen	b2176bf72a	Optimize SSE4.1 version of SAD Make it use the same vblend trick as AVX2. Interestingly, on my test setup this seems to be faster than the same code using 256-bit AVX vectors.	2019-01-07 19:40:57 +02:00
Pauli Oikkonen	887d7700a8	Modify AVX2 SAD to mask data by byte granularity in AVX registers Avoids using any SAD calculations narrower than 256 bits, and simplifies the code. Also improves execution speed	2019-01-07 18:53:15 +02:00
Pauli Oikkonen	7585f79a71	AVX2-ize SAD calculation Performance is no better than SSE though	2019-01-07 16:26:24 +02:00
Pauli Oikkonen	ab3dc58df6	Copy SAD SSE4.1 impl to AVX2	2019-01-03 18:31:57 +02:00
Pauli Oikkonen	45ac6e6d03	Tidy pack_16x16b_to_16x2b comments	2019-01-03 16:37:05 +02:00
Ari Lemmetti	cd818db724	Add missing quantization and residual in cost calculation (inter rd=2).	2018-12-21 15:55:29 +02:00
Pauli Oikkonen	016eb014ad	Move packing 16x16b -> 16x2b into separate function	2018-12-20 10:51:44 +02:00
Ari Lemmetti	b234897e8a	Fix smp and amp blocks in fme and revert previous change. Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc. Calculate SATD on the 8x4, ... part	2018-12-19 21:30:53 +02:00
Pauli Oikkonen	9aaa6f260d	Fixes to enable portability	2018-12-18 20:42:09 +02:00
Pauli Oikkonen	2fdbbe9730	Move CG reordering code from quant-avx2 to shared header	2018-12-18 19:42:18 +02:00
Pauli Oikkonen	d02207306d	Create a header file for shared AVX2 code	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	361bf0c7db	Precompute >=2 coeff encoding loop with 2-bit arithmetic Who needs 16x16b vectors when you can do practically the same with 16x2b pseudovectors in 32-bit general purpose registers!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	940b0e9e6a	Require BMI2 for AVX2 build Any processor implementing AVX2 should also implement BMI2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	f66cb23d5b	Optimize greater1 encoding loop Calculating the c1 variable need not be a serial operation!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	8c8b791c35	Vectorize kvz_context_get_sig_ctx_inc	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	033261eb74	Eliminate two branches using bit magic	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c4434e8d04	Scan CG's in forward order to simplify finding last significant	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	efd097f5a5	Vectorize the coeff group loop to some extent	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	a01362e638	use the efficient method of reordering raster->scan	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	50a888e789	Use the efficient method to find first and last nz coeffs in block	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	7e9203f566	Scan coeff groups in scan order to help find last significant one	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	9a5a6fdbc7	Simplify two ifs in encode_coeff_nxn-avx2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	37a2a8bac8	See if loop can be optimized by rearranging	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	584f2f74b6	Vectorize significant coeff group scanning loop	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	1bfed73221	Add AVX2 strategy for encode_coding_tree	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c3a6f3112a	Add generic strategy group for encode_coding_tree	2018-12-18 19:41:09 +02:00
Marko Viitanen	1ef851ab4b	Disable FME on amp/smp blocks with width or height not divisible by 8	2018-12-18 10:28:21 +02:00
Joose Sainio	b71c5573f0	Merge branch 'rate_control_fix'	2018-12-17 12:39:27 +02:00
Sergei Trofimovich	68a70e45a1	x86 asm: mark stack as non-executable Gentoo's `scanelf` QA tool detects writable/executable stack of assembly-writtent files as: ``` $ scanelf -qRa . 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-satd.o ``` Normally C compiler emits non-executable stack marking (or GNU assembler via `-Wa,--noexecstack`). The change adds non-executable stack marking for yasm-based assmbly files. https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>	2018-12-16 11:31:56 +00:00
Reima Hyvönen	1fcc5c6a8d	Merge branch 'bipred_recon'	2018-12-11 09:59:35 +02:00
Reima Hyvönen	e4a10880f3	Added case 12 to bipred_recon no mov	2018-12-11 09:52:17 +02:00
Marko Viitanen	a4f3968e52	Fix Visual Studio errors by initializing some variables used in AVX2 signhiding	2018-12-11 09:33:26 +02:00
Ari Lemmetti	ac943147e3	Calculate satd cost for whole non-square blocks as well.	2018-12-10 17:04:29 +02:00
Pauli Oikkonen	c465578048	Add a descriptive comment to coefficient reordering	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	f78bf2ebcb	Optimize q_coefs usage for indexed fetch	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	d9591f1b49	Eliminate midway buffering of reordered coefs TODO: For some mysterious reason seems slightly slower than the buffered one	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7fe454c51f	Optimize get_cheapest_alternative()	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	6bbd3e5a44	Optimize rearrange_512 function	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	cb8209d1b3	Vectorize transform coefficient reordering loop	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7cf4c7ae5f	Rename "reduce" functions to hsum That's what the functions fundamendally do anyway	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	316cd8a846	Fix ALIGNED keyword and grow alignment to 64B	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	1befc69a4c	Implement sign bit hiding in AVX2	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	c5cd03497e	Require BMI and ABM instruction sets for AVX2 build AVX2 support on a processor should always imply BMI and ABM support. The lzcnt and tzcnt instructions have more suitable semantics in the corner case that source word is 0, and allow us to even handle that scenario without a branch. Apparently Visual Studio will already include this support when building with AVX2 enabled, so only the automake files need to be tweaked.	2018-12-03 15:36:32 +02:00
Reima Hyvönen	f8696b54a4	Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)	2018-11-20 17:09:19 +02:00
Marko Viitanen	a5a10a33c3	Enable --scaling-list parameter and add to the documentation	2018-11-19 10:47:30 +02:00
Reima Hyvönen	710ba288db	Chroma has some problems	2018-11-15 16:42:48 +02:00
Sami Ahovainio	8f98d4aac7	Added square search	2018-11-14 14:50:31 +02:00
Marko Viitanen	6871490dd5	Simplify get_mvd_coding_cost(), only include golomb coding	2018-11-14 14:33:31 +02:00
Ari Lemmetti	a832206bb6	Replace 32-bit incompatible instrinsics	2018-11-12 18:54:33 +02:00
Ari Lemmetti	5c774c4105	Rewrite most of FME and interpolation filters Changes had to break a lot of stuff and were just squashed into this horrible code dump	2018-11-08 20:21:16 +02:00
Joose Sainio	1c8a1f24e2	Don't assume anything about bits spent	2018-11-07 16:03:38 +02:00
Joose Sainio	3471e2470d	Fix using uninitialized value for the first frame	2018-11-07 08:17:39 +02:00
Joose Sainio	d95ac11a3b	Fix rate_control for other LP-GOPS	2018-11-06 14:20:44 +02:00
Joose Sainio	67a6ba667e	Fix rate control for flat lp-gop	2018-11-06 09:38:17 +02:00
Reima Hyvönen	7406c33a42	Some more cleaning	2018-10-26 12:25:18 +03:00
Reima Hyvönen	4c71546b2e	Cleaned some coding	2018-10-26 12:19:44 +03:00
Reima Hyvönen	4fe3909e48	Switched luma to use 32bits size ints intstead of 16bit size	2018-10-24 18:24:46 +03:00
Eemeli Kallio	284e73839e	Calculating zero cost moved to its own function	2018-10-16 11:02:01 +03:00
Reima Hyvönen	381e786e10	Trying to find the bug in luma	2018-10-11 18:08:41 +03:00
Marko Viitanen	c589e5ed36	Fix closed-gop frame feed, the ordering was incorrect after the first GOP	2018-10-10 11:12:03 +03:00
Reima Hyvönen	2f5f81bac3	removed the non-optimated bipred function	2018-10-09 11:19:23 +03:00
Marko Viitanen	75dce4f3ce	Fix low-delay-gop usage with --no-open-gop	2018-10-04 15:16:02 +03:00
Marko Viitanen	de71b58f76	Change closed GOP structure to include an additional IDR between GOPs	2018-10-04 11:17:03 +03:00
Reima Hyvönen	212a8e68fa	Modified to avoid memory overflow, still some bug inside luma	2018-10-02 20:23:32 +03:00
Marko Viitanen	954f07e3d7	Add --(no-)open-gop option	2018-10-02 10:05:32 +03:00
Marko Viitanen	8bef85e056	Merge branch 'set-qp-in-cu'	2018-09-03 08:33:33 +03:00
Ari Lemmetti	2fdcc2b79d	Add option --set-qp-in-cu	2018-09-03 08:32:45 +03:00
Reima Hyvönen	896034b7cf	Some renamed functions back	2018-08-28 15:31:10 +03:00
Reima Hyvönen	e8b5e6db4c	Did some merging	2018-08-28 15:26:27 +03:00
Reima Hyvönen	7de5c74434	Updated bipred_recon to work faster	2018-08-28 15:12:31 +03:00
Reima Hyvönen	47b357cca2	Comment one test	2018-08-27 18:52:14 +03:00
Reima Hyvönen	2ca99a44e8	Updated shuffle operation to be in right order	2018-08-27 18:16:38 +03:00
Marko Viitanen	b85ae3688e	Signal QP in slice header if tiles and slices=tiles are enabled Keeps the PPS constant for various purposes	2018-08-16 08:44:39 +03:00
Reima Hyvönen	508b218a12	some modifications made to prevent reading too much	2018-08-14 10:50:39 +03:00
Reima Hyvönen	1d935ee888	some useless stuff removed	2018-08-13 16:47:11 +03:00
Reima Hyvönen	ce3ac4c05e	some modifications to no_mov	2018-08-13 16:41:02 +03:00
Reima Hyvönen	15a613ae94	test if no_mov breaks testing	2018-08-13 16:02:56 +03:00
Reima Hyvönen	97a2049e58	removed pointer declaration out from switch	2018-08-10 16:42:26 +03:00
Reima Hyvönen	aa94bcedbc	Stream is now pointer	2018-08-10 16:38:49 +03:00
Reima Hyvönen	fa5b227ece	256 to 32 doesn't work, made them by hand	2018-08-10 16:01:20 +03:00
Reima Hyvönen	408dedbcc8	removed _mm256_extract_epi8 and replaced with _mm_stream	2018-08-10 15:53:26 +03:00
Reima Hyvönen	31c35091c6	_mm256_cvtsi256_si32 removed	2018-08-10 10:06:40 +03:00
Reima Hyvönen	99dc43074f	_mm256_cvtsi256_si32 breaks system, too much bits. back to extract	2018-08-10 09:59:33 +03:00
Reima Hyvönen	4f1f80b2cb	Transformed convert from 256 to cast 256 -> 128 and then convert from 128	2018-08-09 15:35:54 +03:00
Reima Hyvönen	4957555eb3	Removed leftover from 939	2018-08-09 15:25:03 +03:00
Reima Hyvönen	28b165c971	Clearified some sections, added _MM_SHUFFLE macro	2018-08-09 15:23:01 +03:00
Reima Hyvönen	dd04df8667	testing if error in both avx2 functions	2018-08-03 11:49:00 +03:00
Reima Hyvönen	ed50d71fde	Switched some variables to different location, altered inter_recon_bipred_avx2 function	2018-08-02 16:08:59 +03:00
Reima Hyvönen	f5739a0028	Renaming and removing useless prints	2018-08-02 14:47:17 +03:00
Reima Hyvönen	bc09f59bb6	Edited some definitions	2018-08-02 11:54:53 +03:00
Arttu Ylä-Outinen	83555c3d6d	Enable --fast-residual-cost with fastest presets	2018-07-16 12:31:20 +03:00
Arttu Ylä-Outinen	c438bb4a19	Add an option to skip CABAC for residual costs Adds command line option --fast-residual-cost=<limit>. When QP is below the limit, estimates the cost of coding the residual coefficients from the sum of absolute coefficients. Skipping CABAC is not worth it with high QPs because there are fewer coefficients so CABAC is not as slow.	2018-07-16 12:31:20 +03:00
Reima Hyvönen	a4bf77f208	Tested some extract functions	2018-07-12 09:29:32 +03:00
Reima Hyvönen	c05033a893	Even more useless vectors removed	2018-07-11 15:09:14 +03:00
Reima Hyvönen	884cb77238	Removed some not used vectors	2018-07-11 15:06:11 +03:00
Reima Hyvönen	792689a5ff	Removed for-loops, added extract instead	2018-07-11 14:56:41 +03:00
Reima Hyvönen	f9c7f6ee66	Added some break-operations for avx2 optimation	2018-07-11 14:15:38 +03:00
Reima Hyvönen	cc064da143	some more optimation for bipred	2018-07-11 11:27:54 +03:00
Reima Hyvönen	9a339eef89	Merge branch 'bipred_recon' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into HEAD # Conflicts: # build/kvazaar_lib/kvazaar_lib.vcxproj	2018-07-10 16:21:04 +03:00
Reima Hyvönen	a22cf03ddb	Updated to have no movement function to avx2 strategies	2018-07-10 16:07:15 +03:00
Arttu Ylä-Outinen	b7474eb532	Fix SAO buffer sizes Increases sizes of buffers used for SAO reconstruction to avoid stack buffer overflow in AVX2 SAO reconstruction.	2018-07-05 15:56:30 +03:00
Arttu Ylä-Outinen	b37470e80f	Merge pull request #207 from jbeich/maltivec Unbreak build on PowerPC if AltiVec isn't supported	2018-07-04 11:06:41 +03:00
Reima Hyvönen	ea83ae45f0	Toimiva ratkaisu	2018-07-03 11:18:51 +03:00
Jan Beich	4f4bea7496	Check -maltivec is supported before using PowerPC target may lack or have non-standard FPU: $ cc -dumpmachine powerpcspe-undermydesk-freebsd $ cc -c -maltivec -Isrc src/strategies/altivec/picture-altivec.c src/strategies/altivec/picture-altivec.c:1: error: AltiVec and E500 instructions cannot coexist	2018-07-02 23:25:23 +00:00
Jan Beich	b892d820f8	Clean up macOS includes on powerpc* after `93e1c9f1c3` strategyselector.c:426:25: machine/cpu.h: No such file or directory	2018-07-02 21:52:45 +00:00
Reima Hyvönen	17babfffa4	25.6 working optimation, ~50% faster than original	2018-06-25 17:06:16 +03:00
Arttu Ylä-Outinen	2f995f4325	Merge pull request #205 from jbeich/powerpc Unbreak build on non-Linux powerpc*	2018-06-19 13:28:00 +03:00
Arttu Ylä-Outinen	c1398ef818	Permit --period=1 with any GOP structure All intra coding is a special case so it can be permitted even though Kvazaar normally only supports intra periods that are divisible by the GOP length.	2018-06-18 12:26:11 +03:00
Arttu Ylä-Outinen	abdebe0bf9	Fix --owf help message The number of parallel frames is --owf plus one, not --owf minus one. Fixes #204.	2018-06-18 09:33:36 +03:00
Jan Beich	93e1c9f1c3	Add AltiVec detection for BSDs strategyselector.c:377:26: linux/auxvec.h: No such file or directory	2018-06-17 15:38:24 +00:00
Miika Metsoila	98972d26c2	Document that the high tier requires level 4 or higher	2018-06-14 12:41:03 +03:00
Miika Metsoila	62b44efaa4	Write the encoding tier (main/high) into the bitstream	2018-06-14 12:41:03 +03:00
Arttu Ylä-Outinen	a343f6d587	Prepare for delta QPs at CU-level - Replaces lcu_dqp_enabled with max_qp_delta_depth in encoder_control_t. - Fixes set_cu_qps so that it can handle quantization groups of arbitrary size. - Fixes computation of QP predictors so that it works for quantization groups of arbitrary size.	2018-06-13 15:36:19 +03:00
Arttu Ylä-Outinen	dc6b2024ea	Modify reference count asserts to fix data races Changes asserts on the reference count of objects to assert the value after KVZ_ATOMIC_INC instead of directly checking the value. Fixes some data races detected by TSan.	2018-06-12 09:35:07 +03:00
Ari Lemmetti	4fb1c16c61	Add early termination for intra rdo when a zero coefficient block is found.	2018-06-08 21:03:07 +03:00
Ari Lemmetti	492529fb7a	Add the same comment to help message as well...	2018-05-30 14:13:15 +03:00
Ari Lemmetti	0d5972bf03	Add missing sort to intra transform split search so mode at 0 is the best	2018-05-21 13:10:38 +03:00
Sebastien Alaiwan	954bca7d6e	Fix memset parameter	2018-05-17 11:24:49 +02:00
Jaakko Laitinen	f9466efcbb	Close file on error	2018-05-15 11:50:16 +03:00
Reima Hyvönen	9fed29f950	optimation for inter_recon_bipred	2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen	5c585c4fbc	Update help message Updates the default option values to match the medium preset.	2018-04-03 10:40:37 +03:00
Arttu Ylä-Outinen	2b4e22111a	Update presets The new presets are slower but have better coding efficiency.	2018-04-03 10:37:30 +03:00
Arttu Ylä-Outinen	7185519a1b	Update command line help - Adds missing default values. - Adds help for --crypto and --key. - Adds help for --rd=3. - Adds help for --sao options. - Some changes to help wording.	2018-03-23 14:33:04 +02:00
Arttu Ylä-Outinen	3606860504	Add --no-cpuid option Equivalent to --cpuid=0.	2018-03-23 12:32:27 +02:00
Arttu Ylä-Outinen	fb462b25ef	Fix transform skip for inter The transform skip flag in cu_info_t was stored under the intra substruct even though transform skip can be used for inter as well. This caused bitstream errors. Fixed by moving the flag out of the substruct.	2018-03-20 11:01:33 +02:00
Arttu Ylä-Outinen	b64e46707d	Skip raster scan step in TZ search Raster scan is very slow and the BD-rate improvement is marginal.	2018-03-01 14:04:03 +02:00
Arttu Ylä-Outinen	6877064230	Add zero neighborhood check to TZ search Adds an additional grid search step that starts from the zero motion vector after the normal grid search. The search range for this step is half of the normal range.	2018-03-01 14:02:13 +02:00
Arttu Ylä-Outinen	74a413c46a	Switch to star refinement in TZ search	2018-03-01 13:06:14 +02:00
Arttu Ylä-Outinen	ebee428ee1	Add loop termination to TZ grid search Terminates the grid search if no better motion vector was found in the last three iterations.	2018-03-01 13:06:06 +02:00
Arttu Ylä-Outinen	4c175621dd	Fix TZ grid search and star refinement - Changes TZ grid search and star refinement to keep the origin constant instead of moving to the best position after each iteration. - Changes star refinement to loop until there is no more improvement, instead of running the step only once.	2018-03-01 12:56:57 +02:00
Arttu Ylä-Outinen	9c2d0074a2	Add rounding of motion vectors in inter search When the starting point for integer motion estimation was selected among the merge candidates, the candidate motion vectors were always rounded down. This commit changes the rounding so that they are rounded to the nearest integer MV instead.	2018-03-01 09:39:21 +02:00
Ari Lemmetti	662430d441	Select CU type based on SSD, transform unit tree and mode cost of luma and chroma on --rd=2	2018-02-22 19:26:48 +02:00
Arttu Ylä-Outinen	cb06cfeadb	Drop temporary arrays in bipred search Changes bipred search to use the original source and reconstruction arrays directly instead of copying them.	2018-02-14 11:20:51 +02:00
Arttu Ylä-Outinen	0ea516ba30	Move bipred search to a separate function	2018-02-14 09:56:53 +02:00
Arttu Ylä-Outinen	6f506be12d	Drop dynamic allocation from bipred search Moves the temporary LCU struct used in bipred search from the heap to the stack. The single malloc call was a huge bottleneck in bipred.	2018-02-14 09:55:02 +02:00
Arttu Ylä-Outinen	7155dd0db7	Add negative references to L1 list Changes reference index list creation so that the negative references are added to L1 in addition to L0 when biprediction is enabled and no reordering of pictures is done. Biprediction can now be used with the low-delay GOP structure.	2018-02-07 14:54:52 +02:00
Arttu Ylä-Outinen	4b24cd03a2	Update for crypto++ 6.0.0 compatibility Changes the crypto module to use unsigned char instead of byte. The byte typedef is no longer included in the global namespace in crypto++ 6.0.0. See https://github.com/weidai11/cryptopp/issues/442. Fixes #184.	2018-02-05 13:35:03 +02:00
Arttu Ylä-Outinen	8c53417006	Check zero coefficient cost for inter Checks the cost of flushing all coefficients of an inter block to zero. This is much faster than doing full RDOQ but can still reduce bitrate significantly. Encoding speed is increased since fewer coefficient bits have to be coded with CABAC.	2018-01-29 12:41:56 +02:00
Arttu Ylä-Outinen	018b5ffa64	Move inter CU reconstruction to a new function Moves code for reconstructing all PUs in an inter CU to a new function kvz_inter_recon_cu in inter.c.	2018-01-24 15:05:39 +02:00
Arttu Ylä-Outinen	405b8c1069	Refactor inter MVD cost functions Moves duplicate code for writing the MVD of a single motion vector from kvz_get_mvd_coding_cost_cabac and encoder_inter_prediction_unit to a new function.	2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen	c1cca1ad7f	Refactor inter MV candidate selection Moves duplicate code for checking the best MV candidate from functions calc_mvd_cost, search_pu_inter_ref and search_pu_inter to a new function.	2018-01-19 08:29:17 +02:00
Arttu Ylä-Outinen	9067aa4535	Remove an unnecessary copy in SMP/AMP search SMP/AMP search is performed using a lower work tree level than the normal inter search so the prediction info must be copied up if an SMP/AMP mode is chosen. Previously pixels and coefficient were copied as well. Changed to only copy prediction info.	2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen	89a930d6dd	Add part mode bitcost when using SMP/AMP blocks	2018-01-18 10:36:26 +02:00
Arttu Ylä-Outinen	fc43643ba5	Use a transform split for SMP and AMP blocks	2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen	c74ede148b	Fix CBF flags for 4x4 luma blocks CBF flags were not being propagated to the upper level from blocks of size 4x4.	2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen	0a69e6d18f	Fix selection of transform function for 4x4 blocks DST function was returned for inter luma transform blocks of size 4x4 even though they must use DCT. Fixed by checking the prediction mode of the block in addition to whether it is chroma or luma.	2018-01-18 10:36:25 +02:00
Miika Metsoila	bcedfd6669	Remove the usage of errno in me-steps argument parsing	2018-01-16 14:38:43 +02:00
Miika Metsoila	39ed36830e	Merge branch 'me_steps'	2018-01-16 14:22:59 +02:00
Miika Metsoila	61213e3ad9	Improve step parameter parsing and usage	2018-01-10 15:16:52 +02:00
Arttu Ylä-Outinen	649113a821	Fix inter search being used for 4x4 blocks When 4x4 intra blocks are enabled and inter search is limited to 16x16 and larger blocks, it is possible that inter search is accidentally done for 4x4 blocks. Fixed by checking that block size is at least 8x8 before doing inter search.	2018-01-10 14:21:48 +02:00
Miika Metsoila	e8e0e7596a	Add a step-cutoff parameter for motion estimation search	2017-12-22 14:04:25 +02:00
Miika Metsoila	4e13608b01	Merge branch 'diamond_search'	2017-12-18 14:11:53 +02:00
Miika Metsoila	2cde0d1a18	Document diamond search option	2017-12-12 14:45:01 +02:00
Miika Metsoila	b923b63b42	Add diamond search	2017-12-12 14:40:14 +02:00
Ari Lemmetti	14892fda00	Replace simple coefficient cost estimation with CABAC. Substantial improvement. Approximation proved to be too inaccurate while not giving actually that much speedup.	2017-12-10 01:23:48 +02:00
Miika Metsoila	ea79069dc8	Fix a type warning in encmain.c	2017-12-08 16:22:40 +02:00
Miika Metsoila	6aa4cd7528	Fix type warnings	2017-12-08 16:16:36 +02:00
Miika Metsoila	b3486b5114	Fix gcc/clang warnings and errors in cfg.c	2017-12-08 16:09:00 +02:00
Miika Metsoila	bac07457ea	Merge branch 'hevc_level'	2017-12-08 15:57:38 +02:00
Miika Metsoila	c67a24e6ec	Update readme and --help text	2017-12-07 12:32:46 +02:00
Ari Lemmetti	713e694d82	Define HAVE_STRUCT_TIMESPEC on Visual Studio 2015 and later Fixes redefinition of timespec that Pthreads-Win32 does even if it has been already defined.	2017-12-05 18:26:12 +02:00
Miika Metsoila	f64d42169f	Improve bitrate checking to accommodate non-integer and less than 1 framerates	2017-12-01 17:20:12 +02:00
Miika Metsoila	57cf92d35f	Implement level's bitrate limit checking during encoding	2017-11-28 16:19:44 +02:00
Miika Metsoila	021fb27787	Add high-tier flag	2017-11-20 16:05:28 +02:00
Miika Metsoila	d249059d61	Minor refactoring of level checking	2017-11-20 13:25:26 +02:00
Arttu Ylä-Outinen	cf85d52b9d	Kvazaar version 1.2.0	2017-11-17 15:23:33 +02:00
Miika Metsoila	4c1512e8c5	Add a check for maximum picture width and height for the given level	2017-11-15 16:39:59 +02:00
Arttu Ylä-Outinen	4cb054295a	Fix linkers Overrides the linkers used for kvazaar, libkvazaar.la and kvazaar_tests. When crypto++ is enabled, the C++ linker is used and when it is disabled, the C linker is used. This removes the need to explicitly specify -lstdc++ in configure when crypto++ is used and fixes the build with crypto++ when libstd++ is not installed.	2017-11-13 15:09:38 +02:00
Miika Metsoila	f9a4aba867	Update documentation, fix input fps default value, remove 0 as default level	2017-11-09 16:53:31 +02:00
Miika Metsoila	ebba0a4f01	Test if input conforms to it's level's limits (excluding bitrate)	2017-11-08 16:15:41 +02:00
Miika Metsoila	fb4d0c3cf2	Move level argument parsing to the correct place and give it initial values	2017-11-03 15:47:35 +02:00
Miika Metsoila	61a31054e1	Add level command-line parameter	2017-11-03 13:04:05 +02:00
Arttu Ylä-Outinen	9974380cdd	Fix bipred and temporal MVP - Fixes two errors in calculating the POC for the reference frame for temporal candidate MV scaling. - Fixes using the MV for the wrong direction when the temporal MV predictor block uses bi-prediction. Fixes #160.	2017-10-25 12:26:41 +03:00
Arttu Ylä-Outinen	841597e123	Fix picture and slice types Changes handling of intra pictures for --gop=8 so that every picture with POC divisible by the intra period is intra. The first picture is IDR and the rest of the intra pictures are CRA. POC is not reset at CRA pictures. The leading pictures that follow the CRA picture are changed to RASL so they are allowed to refer to pictures before the CRA picture. Changes inter slice types to P when the L1 reference list is empty and to B otherwise. In all-intra, all pictures are now IDR pictures with POC zero.	2017-10-20 13:35:26 +03:00
Jaakko Laitinen	957b6850c3	Change ref list printout to match hm decoded printout	2017-09-25 13:48:56 +03:00
Arttu Ylä-Outinen	20aea8df63	Fix POCs when using --gop=8 When using --gop=8 with an intra period greater than one, a single POC would be skipped before every intra frame. This commit fixes the problem by turning the intra frames into BLA frames with leading pictures when using --gop=8.	2017-09-19 09:31:58 +03:00
Miika Metsoila	6e00f63469	Remove unused variables from search_pu_inter_ref function	2017-09-18 15:36:37 +03:00
Miika Metsoila	7b0101ce3d	Merge branch 'reflist_changes' # Conflicts: # src/encoderstate.c # src/search_inter.c	2017-09-18 14:59:37 +03:00
Miika Metsoila	769b17768d	Change max function to MAX macro for clang/gcc compatibility. Remove couple of unnecessary comments	2017-09-15 14:21:51 +03:00
Miika Metsoila	5f7c5443a3	Remove inter.poc	2017-09-12 14:23:19 +03:00
Miika Metsoila	6bd78a3da7	Reverse L0 list sort direction	2017-09-12 14:23:18 +03:00
Miika Metsoila	83dc7e7f50	Made L0 to sort and fixed mv_ref_coded in search_pu_inter	2017-09-12 14:23:18 +03:00
Timothe FRIGNAC	d3362a238e	changed strtod to strtol	2017-08-31 15:14:31 +02:00
Timothe FRIGNAC	3a1ab54ff0	Fixed memory leaks	2017-08-31 11:51:41 +02:00
Timothe FRIGNAC	466297fd77	Fixed build error	2017-08-29 17:01:18 +02:00
Timothe FRIGNAC	2e130912cb	Add --key opt	2017-08-28 17:15:13 +02:00
Miika Metsoila	a5f4cf09b5	Switched from storing POCs in inter.poc to state->frame->refLXs array	2017-08-21 16:34:57 +03:00
Arttu Ylä-Outinen	409d2114f0	Fix motion vector constraints Fixes integer motion vectors being constrained more than what was necessary when using --mv-constraint or --wpp.	2017-08-11 14:41:36 +03:00
Arttu Ylä-Outinen	7144a00beb	Rewrite thread queue Changes thread queue so that only the jobs that are ready to run are stored in the queue. Other jobs are kept track of by pointers in the reverse dependency lists of other jobs. When a job is ready to run it is appended to the queue. The job queue is stored as a linked list. The definitions of threadqueue_queue_t and threadqueue_job_t are moved to the .c file, turning them into opaque structs. Makes thread queue code simpler. Fixes some TSan errors.	2017-08-11 14:18:12 +03:00
Arttu Ylä-Outinen	bc47fe94af	Drop thread queue debug code	2017-08-11 14:18:12 +03:00
Eemeli Kallio	e5cbc7a205	--sao now enables full sao	2017-08-11 13:26:55 +03:00
Eemeli Kallio	4c3453d26f	Fixed issue with no-sao argument	2017-08-11 13:12:22 +03:00
Eemeli Kallio	8674c0f5ee	Added paremeter for band and edge sao.	2017-08-11 11:57:09 +03:00
Eemeli Kallio	d9b93ea368	Added possibility to skip edge or band sao.	2017-08-11 11:51:49 +03:00
Arttu Ylä-Outinen	4b73bdd9aa	Skip checked motion vectors in early termination Changes the second iteration of early termination to skip the motion vectors that were already checked in the first iteration.	2017-08-09 14:29:09 +03:00
Arttu Ylä-Outinen	606d441362	Skip computing MV cost twice in hexagon search Changes the first step of hexagon search to skip the zero offset since the cost of the motion vector has already been computed.	2017-08-09 14:29:09 +03:00
Arttu Ylä-Outinen	fa4648061d	Add mv, cost and bitcost to inter_search_info_t	2017-08-09 14:29:08 +03:00
Arttu Ylä-Outinen	328f051d7f	Put inter search parameters in a single struct Adds struct inter_search_info_t for holding the parameters that are used by most function related to inter search. Passing the parameters in a single struct greatly reduces the number of parameters for many functions.	2017-08-09 14:27:53 +03:00
Miika Metsoila	0dd069f8af	Fixed using wrong POC in add_temporal_candidate	2017-08-09 13:50:21 +03:00
Miika Metsoila	25e0a954c7	Fixed 2 bugs causing incorrect video output	2017-08-09 13:50:21 +03:00
Arttu Ylä-Outinen	24ecddd2a5	Fix wrong strides in SAO reconstruction Functions kvz_sao_reconstruct and encoder_sao_reconstruct used frame->width as the stride instead of frame->rec->stride when accessing frame->rec->data. This caused errors when using tiles and SAO.	2017-08-01 15:40:49 +03:00
Arttu Ylä-Outinen	f0bf959d17	Fix alignment errors in 32-bit build with MSVC Changes the work_tree parameter in search.c functions from an array to a pointer. Fixes "formal parameter with requested alignment of 8 won't be aligned" errors.	2017-07-28 09:27:02 +03:00
Arttu Ylä-Outinen	9694bd2fae	Fix build on 32-bit systems Function coeff_abs_sum_avx2 that was added in `e950c9b` was outside the AVX2 #if directive.	2017-07-28 09:19:29 +03:00
Arttu Ylä-Outinen	ecb0275cdd	Store CU arrays as pointers to the main array Changes field state->tile->frame->cu_array->data to point to the CU array in the main encoder state. Removes the need to copy the CU array to the main CU array after search.	2017-07-28 08:36:45 +03:00
Arttu Ylä-Outinen	e950c9b101	Add AVX2 implementation for coefficient sum	2017-07-28 07:39:36 +03:00
Arttu Ylä-Outinen	d50ae6990c	Add sum of absolute coefficients to strategies	2017-07-28 07:39:15 +03:00
Arttu Ylä-Outinen	59faca0646	Skip CABAC coefficient cost for --rd=0	2017-07-28 07:33:03 +03:00
Arttu Ylä-Outinen	19e051ea40	Reduce intra threshold Reduces intra threshold for --rd=0 from 20 to 8. Threshold of 20 increased BD-Rate too much.	2017-07-25 13:26:38 +03:00
Arttu Ylä-Outinen	e9cf15465e	Fix inter cost in bipred The cost of coding MV ref indices and MV direction was added to bitcost but not inter cost. Fixed by adding the extra bits to inter as well.	2017-07-24 15:24:04 +03:00
Arttu Ylä-Outinen	edbe00763e	Drop extra parameter in kvz_image_calc_sad Drops the parameter max_lcu_below which was always set to -1.	2017-07-24 15:21:19 +03:00
Arttu Ylä-Outinen	ffac29061f	Fix extrapolated inter SATD	2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen	631ef53d2a	Fix inter cost calculations Inter costs are computed using SAD except when fractional motion estimation or bi-prediction is enabled. This commit changes search_pu_inter_ref to recalculate the cost with SATD. Fixes inter/intra cost comparisons since intra costs are always SATD costs.	2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen	6ce2fb1238	Add pixel offsets to encoder_state_config_tile_t Adds fields offset_x and offset_y to encoder_state_config_tile_t.	2017-07-24 15:11:05 +03:00
Arttu Ylä-Outinen	2380ba0d41	Reduce copying in kvz_get_coeff_cost Changes function kvz_get_coeff_cost to only copy the CABAC contexts and not the whole encoder state. Other threads could be simultaneously using the other parts of the encoder state. Only copying the CABAC fixes a TSan data race warning.	2017-07-24 12:38:41 +03:00
Arttu Ylä-Outinen	24b462f801	Align coefficients to 8 bytes Adds alignment attribute to lcu_coeff_t. The coefficients are sometimes handled as 64-bit integers containing four coefficients so the arrays should be aligned to 8 bytes. Fixes a UBSan error about misaligned reads.	2017-07-24 12:37:37 +03:00
Arttu Ylä-Outinen	5ddb43c6fe	Fix undefined left shifts in rdo Replaces left shifts by multiplications when the operand may be a negative value. Left shift of a negative value is undefined behavior.	2017-07-24 12:35:10 +03:00
Arttu Ylä-Outinen	d1e64ad62b	Fix undefined left shifts Replaces left shifts by multiplications when the operand may be a negative value. Left shift of a negative value is undefined behavior.	2017-07-20 11:15:30 +03:00
Arttu Ylä-Outinen	07b5fb9caf	Fix out-of-bounds read in encoderstate When calling encoder_state_encode_leaf with POC 0, index -1 of the GOP array would be accessed. Fixed by skipping the code for I-frames.	2017-07-20 11:15:30 +03:00
Arttu Ylä-Outinen	8c4a3473a8	Change --owf=auto and --threads=auto selection Changes OWF selection so that it is chosen based on the maximum number of parallel CTUs. Number of threads is limited to prevent overhead from extra threads.	2017-07-20 09:42:28 +03:00
Arttu Ylä-Outinen	4fc9b743c1	Drop an unnecessary pthread_cond_broadcast Drop pthread_cond_broadcast on threadqueue->cond in function kvz_threadqueue_waitfor. The broadcast caused threads to be woken up more often than necessary.	2017-07-19 11:09:30 +03:00
Arttu Ylä-Outinen	14003c6a30	Disable printing PSNR with --no-psnr	2017-07-19 10:38:37 +03:00
Arttu Ylä-Outinen	e90bde5c62	Clarify PSNR output Adds letters Y, U and V to the PSNR output to make it clearer that the printed values are the luma and chroma PSNR.	2017-07-19 10:33:43 +03:00
Arttu Ylä-Outinen	fdb3480b54	Enable strategies for SAO reconstruction Re-enables strategies for SAO reconstruction. They were disabled in commit `ec9ff42`.	2017-07-11 10:35:18 +03:00
Arttu Ylä-Outinen	333dba3884	Add static to SAO strategies	2017-07-11 10:02:01 +03:00
Miika Metsoila	e8cc2d8f6a	Small fixes	2017-07-07 13:58:19 +03:00
Arttu Ylä-Outinen	67a60a35e3	Fix invalid calls to normalize_lcu_weights Changes encoder_state_init_new_frame to only call normalize_lcu_weights when the weights have been written to the array and rate control is enabled. When rate control is disabled, the weights are not used.	2017-07-07 11:05:31 +03:00
Arttu Ylä-Outinen	563bc26e71	Fix out-of-bounds read in AVX2 SAO AVX2 version of SAO loaded offsets with a 256 bit read even though there are only five 32 bit integers.	2017-07-06 13:04:52 +03:00
Arttu Ylä-Outinen	0850b17f96	Drop get_wpp_limit in search_inter WPP limit for motion vectors is now computed inside fracmv_within_tile.	2017-07-05 13:22:53 +03:00
Arttu Ylä-Outinen	2a85f0f5a4	Move hard-coded MV limits to encoder_control_t Adds field max_inter_ref_lcu to encoder_control_t. It is used to set up inter-LCU dependencies in encoder_state_encode_leaf and restrict motion vectors in fracmv_within_tile.	2017-07-05 13:22:53 +03:00

... 3 4 5 6 7 ...

2554 commits