hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 02:24:07 +00:00

Author	SHA1	Message	Date
Pauli Oikkonen	816789c9f4	Allow fast coeff weights to be read from a file	2020-10-29 15:22:51 +02:00
Pauli Oikkonen	6799019db0	Move fast coeff table to transform.h Guess this is a more logical place for it	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	4712ce5f59	Round the fast coeff result instead of flooring	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	0fb09c9920	New filtered coeff weight by QP values	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	9bf0cb27b1	Constrain fast cost estimation to QPs we have weights for	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	24d487f553	New weights for 12 <= QP <= 42 Trained using MSU ultrafast settings now	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	3e1c6d84b8	Fix issues in fast coeff estimation Allow weight table to start from nonzero QP, and round weights to Q8.8 instead of flooring them	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	5f91bda762	Use newer data for fast coeff cost estimation Same training dataset, but this time only buckets 0...3 were used to approximate the function, no sign/cg width bucket.	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	2abd733199	Use unsigned min() to correctly clip -32768 If a coeff happens to be -32768 (0x8000), its 16-bit abs() is also 0x8000. It should ultimately be clipped to 3, so interpret absolute values as unsigned instead to make that happen.	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	b93b90c0d7	Implement new fast coeff cost estimator in AVX2	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	2f74a112b3	Try first lookup table based fast coeff estimation	2020-10-29 15:20:27 +02:00
siivonek	bc1206a4d3	Define qp_delta_min & max in global.h instead of calculating them locally.	2020-09-29 13:46:27 +02:00
siivonek	0f3ef786b9	Modify delta QP range assert so it will work with any valid bit depth. Modify VAQ code so it will clip the QP to a proper range which is dependent on bit depth	2020-09-22 20:15:23 +02:00
siivonek	fe6f93a951	Fix delta QP range check assert. Add separate asserts based on bit depth.	2020-09-22 20:15:22 +02:00
Joose Sainio	8143ab971c	Merge branch 'stats-files' # Conflicts: # src/cfg.c # src/cli.c # src/kvazaar.h	2020-09-16 09:25:00 +03:00
Joose Sainio	1c06bd7f3d	Fix POC to be correct for all GOPs and Intra periods, fix issue with vaq	2020-09-14 14:25:48 +03:00
Sami Ahovainio	4d87fb2397	fixed potential out of bounds iteration	2020-09-10 12:59:39 +03:00
Sami Ahovainio	5d521a2444	Added option to force yuv as file format and made the options and file endings case insensitive	2020-09-09 16:05:59 +03:00
Joose Sainio	3fb8b7ebc6	Add --stats-file-prefix option When the option is defined with an option four files prefixlambda.txt, prefixqp.txt, prefixdist.txt, and prefixbits.txt that have the corresponding data for each ctu. This is a debug feature.	2020-09-09 12:35:47 +03:00
Sami Ahovainio	84cabd9c20	Fixed sign match	2020-09-07 15:39:31 +03:00
Sami Ahovainio	d691849594	Added frame header reading for both read and seek functions	2020-09-07 15:31:08 +03:00
Sami Ahovainio	cbcee67821	y4m start header parsing ready	2020-09-07 15:31:07 +03:00
Joose Sainio	c10b841e7c	Merge remote-tracking branch 'remotes/origin/fix-sao-parameter' into master	2020-09-07 13:10:36 +03:00
Joose Sainio	da09d49890	Remove optionality from --sao SAO parameter was optional which caused that if one wants to pass argument one needs to use "=" which is confusing since this is not required for any other parameter	2020-09-07 12:35:40 +03:00
Pauli Oikkonen	3f7f0d7ed7	Allow bit depth to be defined from the outside For a 10-bit build, just use: env CFLAGS="-DKVZ_BIT_DEPTH=10" ./configure && make clean && make	2020-09-02 17:55:22 +03:00
Pauli Oikkonen	780da4568a	Exclude 8-bit-only code from 10-bit builds and use uint8_t instead of kvz_pixel for code that assumes 8-bit pixels	2020-09-02 17:46:33 +03:00
Pauli Oikkonen	31ef4e4216	Fix ml functions to accept kvz_pixel, not uint8_t	2020-09-02 17:46:33 +03:00
Joose Sainio	faf5cc858d	Merge branch 'fix-lp-gop-rc'	2020-06-25 09:41:57 +03:00
Joose Sainio	138651ee85	Fix the bit and frame counts for calculating the gop allocation Additionally dynamically adjust the smoothing window if there are rapid changes	2020-06-24 15:26:54 +03:00
Ari Lemmetti	f8ff6dd567	Merge pull request #262 from jbeich/truncate-freebsd Unbreak build on FreeBSD	2020-06-22 18:08:01 +03:00
Ari Lemmetti	d1abf85229	Add MV constraint check to motion estimation start point	2020-06-01 23:51:38 +03:00
Jan Beich	1fa69c705d	Rename truncate() from `30ce461d98` to avoid conflict with POSIX version strategies/avx2/dct-avx2.c:55:23: error: static declaration of 'truncate' follows non-static declaration static INLINE __m256i truncate(__m256i v, __m256i debias, int32_t shift) ^ /usr/include/stdio.h:448:6: note: previous declaration is here int truncate(const char *, __off_t); ^	2020-04-22 16:09:42 +00:00
Ari Lemmetti	9753820b3a	Update version to 2.0.0	2020-04-22 01:03:36 +03:00
Ari Lemmetti	40e81f3243	Update preset tables. Update docs.	2020-04-22 01:03:21 +03:00
siivonek	54f438a75c	Update VAQ help text. Update docs. Change some lingering tabs to spaces.	2020-04-20 16:52:07 +02:00
Ari Lemmetti	f31dddc019	Bypass inverse quantization and inverse transform when trying early skip	2020-04-10 16:02:09 +03:00
Pauli Oikkonen	fbdb1e2d15	Add correct path to sao_shared_generics.h in makefile	2020-04-08 19:27:12 +03:00
Pauli Oikkonen	8617530b13	Use _mm_store_epi64 instead of _mm_cvtsi128_si64 Fix 32-bit builds that tend to lack the cvt intrinsic. Hope it will be optimized to a movq r64, xmm on modern platforms though	2020-04-07 23:51:54 +03:00
Pauli Oikkonen	a82966c0f5	Fix lacking _mm256_cvtss_f32 intrinsic on VS Cast __m256 into __m128 first, the XMM variant of the intrinsic has been around for a long enough time to be supported	2020-04-07 22:38:10 +03:00
Joose Sainio	c369ff8873	Fix a potential division by zero in a floating point operation When C is calculated with K if the value of K is not clipped before in some cases it is possible that K gets such a large negative value that bpp^K is rounded to zero. In real-life cases this is extremely rare and clipping beforhand has very little to no effect. Also remove commented debug prints	2020-04-06 11:05:49 +03:00
Ari Lemmetti	901c25c0c8	Merge branch 'vaq'	2020-04-03 19:51:17 +03:00
Ari Lemmetti	51451be5ef	Handle cases where the number of pixels is not divisible by 32	2020-04-03 19:37:47 +03:00
siivonek	ee544304f1	Make function static to not mess up tests.	2020-04-03 15:22:34 +02:00
siivonek	e5267f7706	Fix define for use with Visual Studio.	2020-04-03 15:11:01 +02:00
siivonek	9e34369304	Merge branch 'vaq' of https://gitlab.tut.fi/TIE/ultravideo/kvazaar into vaq	2020-04-03 12:35:04 +02:00
siivonek	d025977949	Clamp edge lcu pixels if dimensions are not 64 divisible.	2020-04-03 12:33:14 +02:00
Pauli Oikkonen	addc1c3ede	Fix warning about potentially unused hsum_8x32b There's a lot of alternative options available, such as making it globally visible with a kvz_ prefix, force inlining it, or anything. This could be good too, hope it won't be compiled at all to translation units where it's not used.	2020-04-02 16:44:22 +03:00
siivonek	e3ba0bfb8c	Fix memory leak.	2020-04-02 14:15:36 +02:00
siivonek	566680af7b	Move function hsum to file where it is used to avoid errors.	2020-04-02 14:03:06 +02:00
siivonek	58be514e2a	Fix pipeline error.	2020-04-02 13:50:08 +02:00
siivonek	2aa0d97589	Add VAQ test in test_tools. Bump minor version number in configure.ac. Update help text for VAQ.	2020-04-01 18:16:39 +02:00
siivonek	c6e421019e	Merge vaq-simd	2020-03-31 21:40:29 +02:00
Jaakko Laitinen	8e4b738900	Fix error when first value in pu depth list is omitted	2020-03-31 16:57:12 +03:00
Jaakko Laitinen	54ef0bbfd2	Fix unintended functionality when giving multiple --pu-depth-intra/inter list parameters	2020-03-31 16:39:56 +03:00
Jaakko Laitinen	cb0c7b23b5	Merge branch 'intra_qp_offset_auto' into 'master' Add auto option to intra-qp-offset See merge request TIE/ultravideo/kvazaar!7	2020-03-31 16:17:36 +03:00
Pauli Oikkonen	99889dab15	Fix switch(bool) in picture-avx2.c It passes on GCC but warns on Clang	2020-03-31 15:42:19 +03:00
Jaakko Laitinen	e0440c3de1	Update docs	2020-03-31 15:27:48 +03:00
Jaakko Laitinen	7760dcf441	Remove intra qp offset from preset parameters	2020-03-31 14:06:07 +03:00
Jaakko Laitinen	8bd1a2b667	Update help message	2020-03-31 13:19:05 +03:00
Jaakko Laitinen	b4f5486190	Set intra qp offset default to auto	2020-03-31 12:58:40 +03:00
Jaakko Laitinen	740688c67d	Add auto option to intra qp offset	2020-03-31 11:56:44 +03:00
Pauli Oikkonen	0c7bfa7dc9	Fix AVX2 on Clang Besides just -mavx2, AVX2 support depends on a couple minor instruction set extensions that should always exist on AVX2-capable hardware. Too bad the different bit twiddling instructions are invoked slightly differently between GCC and Clang, but now Clang seems to also produce an AVX2-capable build.	2020-03-26 18:48:48 +02:00
siivonek	89d3e674ce	Comment out code which possible messes up OBA	2020-03-26 17:49:31 +02:00
siivonek	be7d9ddec5	Fix error in frame variance calculation. Chroma channels were not added to variance	2020-03-26 14:33:00 +02:00
Jaakko Laitinen	45ca8f8113	Merge branch 'master' into 'extended_pu-depths'	2020-03-25 15:11:08 +02:00
siivonek	5986e71535	Fix mistake	2020-03-20 13:43:44 +02:00
Jaakko Laitinen	d6ffe9e495	Update docs	2020-03-20 13:27:07 +02:00
Jaakko Laitinen	621450cc1d	Update --help	2020-03-20 13:07:48 +02:00
Jaakko Laitinen	aaac3df69b	Add prefix to kvazaar.h define	2020-03-20 09:04:00 +02:00
siivonek	2a85be5752	Move qp_to_lambda so it is defined before use. Change some tabs to spaces	2020-03-19 22:13:53 +02:00
siivonek	0a4ce3c0aa	Add vaq to new rate control	2020-03-19 21:43:52 +02:00
siivonek	1bbc598d75	Merge branch 'master' into vaq	2020-03-19 20:19:43 +02:00
Joose Sainio	b53911d637	Merge branch 'rc-intra'	2020-03-19 13:34:15 +02:00
Joose Sainio	a304a8ea6e	Add weights for GOP 16 based on fitting a power curve to bits spent by HM	2020-03-19 11:13:43 +02:00
Joose Sainio	e823ac1dae	miscellaneous fixes - bump library version - add help desk for --clip-neighbour - update the default values of --clip-neighbour and --intra-bits - update tests to more sensible	2020-03-19 10:47:28 +02:00
Jaakko Laitinen	b2ddba38c2	Set correct size for pu-depth min/max data structure	2020-03-19 09:29:43 +02:00
Joose Sainio	2c345bc3cf	try to fix tsan issue	2020-03-18 14:58:54 +02:00
Jaakko Laitinen	fe428dcbe1	Fix no gop functionality	2020-03-18 11:03:33 +02:00
Jaakko Laitinen	af3d559d8d	Let pu-depth be defined per gop-layer	2020-03-17 17:57:18 +02:00
Ari Lemmetti	cbd77944d8	Costs in rough intra search may be negative. Get rid of UBSan error.	2020-03-16 22:13:14 +02:00
Ari Lemmetti	aa0ade3f65	Cast values to unsigned to make UBSan not trigger due to left-shifting negatives	2020-03-16 19:52:34 +02:00
RLamm	27fe716654	Fixed reference POC indexing	2020-03-11 15:33:37 +02:00
RLamm	bf24831780	Attempt to fix random crashes	2020-03-11 15:31:47 +02:00
RLamm	887659db1f	Attempted to scale the extra_mvs	2020-03-11 15:31:46 +02:00
siivonek	8d9719ff90	Merge branch 'master' into vaq	2020-03-05 14:17:01 +02:00
Joose Sainio	c9a8f2a596	Completely disable intra based model for frame 1	2020-03-04 12:52:13 +02:00
Joose Sainio	19c79c3e58	don't use the intra frame based estimation if the result is bad	2020-03-04 09:26:22 +02:00
Ari Lemmetti	7b7358c25a	Update presets veryslow and placebo a bit Both use now --gop 16, --intra-qp-offset -3, --me tz, and --transform-skip	2020-03-03 20:41:01 +02:00
Pauli Oikkonen	60e7956dc5	Disable inaccurate integer variance calculation for now	2020-03-02 19:18:55 +02:00
Pauli Oikkonen	fc1b91335b	Implement variance calculation in integer math Maybe this is a bit faster than FP, it's not accurate though	2020-03-02 18:17:18 +02:00
Pauli Oikkonen	35c825c75f	Move hsum_8x32b to avx2_common_functions	2020-02-27 17:52:17 +02:00
Pauli Oikkonen	b00ac7d1c4	AVX2 version of buffer variance calculation	2020-02-25 15:57:56 +02:00
siivonek	a380e43bda	Add chroma channels to variance calculation.	2020-02-24 19:54:34 +02:00
Pauli Oikkonen	1bd9c6dd93	Make a strategy out of pixel_var	2020-02-24 19:37:36 +02:00
Pauli Oikkonen	86ebf366e1	fix typo	2020-02-24 18:18:10 +02:00
Joose Sainio	f81de41775	Merge branch 'master' into rc-intra	2020-02-24 15:30:57 +02:00
siivonek	5688bcd646	Merge branch 'master' into vaq	2020-02-21 17:11:10 +02:00
siivonek	908ecb1767	Add rounding to aq offsets. Fix typo	2020-02-21 13:51:43 +02:00
Ari Lemmetti	1dfc69b42e	Consider merge index bits in merge analysis and early skip	2020-02-20 09:43:58 +02:00
Joose Sainio	7deb22c8e8	Merge branch 'master' into rc-intra	2020-02-19 15:01:04 +02:00
Kari Siivonen (TAU)	c972ca9067	Add assert to check if deltaQP out of bounds. Clip adaptive QP to [-13, 12].	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	f07990794f	Fix error in vaq pixel blit range calculation	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	57ed40c263	Fix application of aq offset	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	be2f420d61	Change: vaq requires parameter. Parameter defines vaq strength ex. 15 == 1.5	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	bf1b2c1e22	Add define for vaq strength parameter	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	150559a7e8	Fix bugs. Enable set_qp_in_cu when using vaq	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	c8c71274ee	Change tabs to spaces.	2020-02-18 13:20:26 +02:00
siivonek	888382953d	Implement calculation of vaq values. Values not used yet.	2020-02-18 13:20:25 +02:00
siivonek	ad40a88c09	Add no-vaq option to vaq	2020-02-18 13:20:25 +02:00
siivonek	09f0a1c52e	Fix typo in comment	2020-02-18 13:20:25 +02:00
siivonek	84fb3fd7d1	aq: Add --vaq commandline option	2020-02-18 13:20:25 +02:00
Joose Sainio	2a98f5db1e	fix intra-bits for lp-gop	2020-02-18 10:38:29 +02:00
Ari Lemmetti	71d9327f62	Further improve fast bipred	2020-02-17 20:32:52 +02:00
Ari Lemmetti	80c26870d5	Update docs	2020-02-15 23:29:18 +02:00
Ari Lemmetti	ebb183cc01	Add option to make intra QP offset configurable	2020-02-15 22:54:48 +02:00
Ari Lemmetti	be3e08d6db	Add gop.h to Makefile	2020-02-15 22:54:47 +02:00
Ari Lemmetti	1354acd358	Prevent negative values being written to SPS with --gop=0	2020-02-15 22:54:47 +02:00
Ari Lemmetti	fe4869916c	Disable GOP and intra qp offset for all-intra coding automatically	2020-02-15 22:54:46 +02:00
Ari Lemmetti	9849fb7c77	Enable experimental rate control for GOP 16	2020-02-15 22:54:46 +02:00
Ari Lemmetti	a0a22dec8a	Remove deprecated / unused lambda adjustments	2020-02-15 22:54:46 +02:00
Arttu Ylä-Outinen	829a70e6a7	Copy lowdelay GOP definition from HM	2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen	28f99c0b87	Change definition of 8-GOP to match HM	2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen	636fa8fbdd	Fix maximum decoded picture buffer size	2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen	ebd5156db5	Add definition for random access GOP of length 16	2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen	6653f06dd0	Only compute GOP layer weights when RC is enabled	2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen	c8fff1e0d6	Use a larger number of bits for POC lsb when needed Changes the number of bits used for coding the least significant bits of the POC based on the GOP size.	2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen	d757a832c2	Change GOP QP offset handling to match HM Adds fields qp_model_scale and qp_model_offset to kvz_gop_config and intra_qp_offset to kvz_config.	2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen	f37dcd5879	Move GOP definition to a separate file Moves definition of the 8-GOP from cfg.c to gop.h.	2020-02-15 22:36:55 +02:00
Ari Lemmetti	6e1007a3e7	Get rid of LAMBA! (Commit #3000 )	2020-02-15 22:32:52 +02:00
Ari Lemmetti	0c02e71b43	Remove minor error from readme	2020-02-15 22:29:08 +02:00
Joose Sainio	e90d3141a2	Merge branch 'master' into rc-intra	2020-02-05 11:06:56 +02:00
Ari Lemmetti	9a0236bb4e	Add option 'zero-coeff-rdo'	2020-02-04 21:26:29 +02:00
Ari Lemmetti	886ff36d12	Initial implementation of fast bipred.	2020-02-04 15:46:23 +02:00
Ari Lemmetti	3c7dd0752f	Remove the broken "no mov" branch. Causes hash mismatches for example in SlideShow sequence.	2020-02-03 15:26:31 +02:00
RLamm	bf8941ddb8	Added comment about partial-coding usage	2020-01-31 16:19:48 +02:00
RLamm	b8488ab48d	Changed "partial-coding" variables to uint32_t	2020-01-31 16:02:29 +02:00
RLamm	76e3249754	Changed parameter "slicer" to "partial-coding" to avoid confusion.	2020-01-31 14:22:32 +02:00
RLamm	30d5df40c5	Custom headers for the distributed coding	2020-01-29 15:54:49 +02:00
Joose Sainio	54571529a4	Fix accessing previous frame that didn't exist	2020-01-17 10:48:35 +02:00
Joose Sainio	5c671d20e1	Use the new clipping only in situations where it actually helps	2020-01-17 09:08:21 +02:00
Joose Sainio	3c34d7c863	Fix qp estimation and checking of previous frames that dont exist	2020-01-15 09:32:04 +02:00
Joose Sainio	1a35c22a52	Change clipping of lambda and qp for ctus on OBA rc instead of clipping qp and lambda to the value of last value from the state clip to previous frame with same layer and if such frame doesn't exist, clip to previous frame	2020-01-14 14:46:05 +02:00
Pauli Oikkonen	c3d9e97e9f	Fix VS build	2019-12-12 18:34:55 +02:00
Pauli Oikkonen	7f238ca299	Remove debug print functions Whoops	2019-12-12 18:19:31 +02:00
Pauli Oikkonen	eefb5e50b3	De-inline pred_filtered_dc functions, shouldn't make much difference though	2019-12-12 17:30:00 +02:00
Pauli Oikkonen	169314de4f	32x32 filtered DC prediction in AVX2	2019-12-11 18:17:06 +02:00
Pauli Oikkonen	fb2481b7e4	16x16 filtered DC implemented in AVX2	2019-12-10 15:54:50 +02:00
Joose Sainio	b78aa7b272	save c and k to frame	2019-12-06 10:52:54 +02:00
Joose Sainio	5b10e5fb7e	parameterize the clipping option	2019-12-06 09:51:04 +02:00
Pauli Oikkonen	da370ea36d	Implement AVX2 8x8 filtered DC algorithm	2019-11-28 14:10:10 +02:00
Pauli Oikkonen	5d9b7019ca	Implement a 4x4 filtered DC pred function	2019-11-26 17:05:54 +02:00
Joose Sainio	ca0060cbba	try the original clipping	2019-11-26 15:13:04 +02:00
Pauli Oikkonen	f1485ab087	Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?	2019-11-25 15:20:29 +02:00
Joose Sainio	ab2fded8af	Update threadwrapper to enable pthread_rwlock_t	2019-11-21 13:38:40 +02:00
Joose Sainio	eb78aead1f	Fix additional potential data races	2019-11-21 11:03:12 +02:00
Joose Sainio	35d7e0d88b	Fix data race	2019-11-21 10:25:04 +02:00
Pauli Oikkonen	979d66031c	Create a strategy out of intra_pred_filtered_dc	2019-11-19 14:50:31 +02:00
Joose Sainio	0e8815a3d8	test clipping qp to previous frame instead of previous ctus	2019-11-19 14:32:31 +02:00
Joose Sainio	ddb4e5a131	move the intra bit calculation so that it is used also with lambda rc	2019-11-19 14:16:48 +02:00
Joose Sainio	a07833f3e6	check that mallocs in rc initialization were successful only call kvz_update_after_picture when using the OBA rc	2019-11-19 13:59:44 +02:00
Joose Sainio	50d410a316	re-enable static qp encoding and lambda rc	2019-11-19 13:45:58 +02:00
Pauli Oikkonen	fa4bb86406	Optimize intra_pred_planar_avx2 for 4x4 blocks	2019-11-19 13:39:02 +02:00
Joose Sainio	57e5615ece	Fix incorrect intra rc calculation skipping	2019-11-19 13:25:31 +02:00
Joose Sainio	6cc3bcd87e	Command line parameters for oba rc and implementation of the usage of the intra parameter	2019-11-19 09:29:06 +02:00
Joose Sainio	eb73548af5	Encode first frame completely before starting others to enable owf	2019-11-18 09:51:37 +02:00
Pauli Oikkonen	4761d228f9	Start to vectorize the 4x4 loop	2019-11-15 17:32:40 +02:00
Pauli Oikkonen	8d45ab4951	Stupidify the 4x4 planar loop for vectorization	2019-11-14 17:14:04 +02:00
Joose Sainio	c759c138ed	Prepare the rc data structure to be shared among all frame encoders	2019-11-13 11:56:25 +02:00
Joose Sainio	cdb7c851a4	Fix weight calculation	2019-11-13 08:55:31 +02:00
Joose Sainio	b9b01f8036	WPP with threading	2019-11-12 12:12:57 +02:00
Joose Sainio	615973adca	should enable threading with wpp when owf is not used	2019-11-12 09:03:00 +02:00
Pauli Oikkonen	6f13f6525c	Merge branch 'new_prints'	2019-11-07 17:04:21 +02:00
Joose Sainio	d353f7dd1a	Disable debug prints, fix multiple bugs in the calculation	2019-11-07 15:08:57 +02:00
mercat	57e8c3ebc2	Merge branch 'ML-cplx_red_ICIP'	2019-11-07 13:25:47 +02:00
Pauli Oikkonen	558f0ec401	Mbps, not mbps	2019-11-05 18:06:00 +02:00
Pauli Oikkonen	2edf533925	Tidy the end report printing Also fix a bug with non-integer target FPS	2019-11-05 17:20:00 +02:00
Joose Sainio	408fd4ccb6	Fix lambda and qp calcualtion for intra frames also fixes a bug with selecting the clip neighbor lambda and clip neighbor qp selection for inter frames	2019-11-05 10:51:39 +02:00
Pauli Oikkonen	c7313ce567	Store AVG QP information in encmain	2019-11-04 17:08:07 +02:00
Reima Hyvönen	80575c59bf	Some updates done to get right bitrate and avg QP	2019-10-31 15:56:24 +02:00
Reima Hyvönen	252bab8820	Added prints to bitrate and AVG QP	2019-10-31 15:56:24 +02:00
Pauli Oikkonen	6d7a4f555c	Also remove 16x16 (A * B^T)^T matrix multiply Can be done using (B * A^T) instead, it's the exact same	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	2c2deb2366	Tidy AVX2 32x32 matrix multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	98ad78b333	Tidy the old AVX2 32x32 matrix multiply It was actually a very good algorithm, just looked messy!	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	4a921cbdb5	Retain data as much in YMM registers as possible This seems to make it a whole lot quicker	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ac4d710e23	Unroll 32x32 matrix multiply, use all regs	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	a58608d0b8	Remove totally unnecessary (A * B^T)^T 32x32 multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	043f53539f	Implement a streamlined matrix-multiply 32x32 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e9da2d851b	Tidy 32x32 fast DCT's helper functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e382339182	Implement fast (butterfly) 32x32 DCT in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	b5962dadac	Tidy indentation in AVX2 16x16 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	36a8f89025	Fine-tune 16x16 AVX2 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ca9409de2b	Implement 16x16 DCT as butterfly algorithm in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7c69a26717	Use aligned loads and stores for AVX2 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e9c65dca6	Align DCT matrices and temp transform buffers	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	148a150522	Align DCT source and dest blocks to cache line	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e60bbf6a6	Slightly tune 16x16 forward DCT Use an array of __m256i's to store temporary value, essentially letting the compiler enforce alignment and use aligned loads and stores.	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	c0cc0e8a75	Optimize 16x16 multiply by only slicing right mat once	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e463d27f22	Implement streamlined generic 16x16 matrix multiply It can't be this fast for real, can it?	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	beb85ce9d6	Reorder parameters for 8x8 matrix multiplies	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	292af62256	Implement tailored 16x16 forward DCT	2019-10-28 16:19:42 +02:00

... 2 3 4 5 6 ...

2870 commits