hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-30 20:54:07 +00:00

Author	SHA1	Message	Date
Ari Lemmetti	33295bf350	Use AVX2 luma interpolation for SMP and AMP as well	2021-03-08 22:36:09 +02:00
Ari Lemmetti	7ce68761c2	Add a reminder to fix a rare case for bipred	2021-03-08 22:36:09 +02:00
Ari Lemmetti	475f1d79d5	Add some defines for important interpolation related sizes	2021-03-08 22:36:09 +02:00
Ari Lemmetti	4314f3a9a7	Rename some interpolation functions and strategies for consistency	2021-03-08 22:36:08 +02:00
Ari Lemmetti	5a70b49f69	Require 64-bit build for AVX2 interpolation filter functions	2021-03-08 22:36:08 +02:00
Ari Lemmetti	5631651469	Remove unused functions and variables	2021-03-08 22:36:08 +02:00
Ari Lemmetti	e38219e489	Fix epol_func signature and function definition	2021-03-08 22:36:07 +02:00
Ari Lemmetti	7e6ba9750f	Add new AVX2 ip filters for chroma	2021-03-08 22:36:07 +02:00
Ari Lemmetti	3476fc62c7	Fix parameter to signed	2021-03-08 22:36:06 +02:00
Ari Lemmetti	e572066e46	Add new AVX2 vertical ip filter for pixel precision	2021-03-08 22:36:06 +02:00
Ari Lemmetti	9e4b62a891	Use the new horizontal filter for pixel precision as well	2021-03-08 22:36:06 +02:00
Ari Lemmetti	2175023843	Relocate function	2021-03-08 22:36:06 +02:00
Ari Lemmetti	f5b0e3c52b	Add new AVX2 horizontal ip filter capable of every luma PB	2021-03-08 22:36:05 +02:00
Ari Lemmetti	d9a3225ae5	Add new AVX2 vertical ip filter for high-precision	2021-03-08 22:36:05 +02:00
Ari Lemmetti	84222cf3e7	Replace old block extrapolation with more capable one. Separate paddings for different directions can be now specified.	2021-03-08 22:36:04 +02:00
Marko Viitanen	e05dcdb193	Enable sign hiding in quant_avx2 and fix a bug in kvz_encode_coeff_nxn_generic()	2021-02-12 16:40:28 +02:00
Marko Viitanen	79c36f6aeb	Enable RDOQ and sign hiding	2021-02-12 13:24:02 +02:00
Arttu Makinen	7098a94a6f	Implemented implicit MTS. Added selection of implicit MTS to command parameters. Updated the transform selection to support implicit MTS.	2021-02-11 15:11:15 +02:00
Arttu Mäkinen	8f34685a8f	Merge branch 'master' into 'mts' # Conflicts: # src/cfg.c # src/kvazaar.h	2021-02-10 13:05:18 +02:00
Arttu Makinen	c5570abe1b	Removed 'emt' variable from cu_info_t and changed 'emt' globally to 'mts' for consistency.	2021-02-10 12:08:05 +02:00
Arttu Makinen	d0b7dd95f7	MTS works on intra mode. Fixed usage of MTS constraints. Fixed DCT8 transforms. Added sorting function of MTS modes with intra modes and costs to search.c.	2021-02-10 11:01:58 +02:00
Arttu Makinen	2e7c342645	Implemented DCT2, DST7, and DCT8 transforms, and search for selecting transform for MTS. Using MTS results mismatch for luma component.	2021-02-02 11:09:43 +02:00
Arttu Makinen	b9c3336f0e	MTS bitstream encoding added for intra. Work with depths 0-3.	2021-01-18 20:44:36 +02:00
Arttu Makinen	98a8e78e93	avx2/encode_coding_tree-avx2.c update, because it caused errors	2020-12-30 14:25:16 +02:00
Pauli Oikkonen	816789c9f4	Allow fast coeff weights to be read from a file	2020-10-29 15:22:51 +02:00
Pauli Oikkonen	6799019db0	Move fast coeff table to transform.h Guess this is a more logical place for it	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	4712ce5f59	Round the fast coeff result instead of flooring	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	0fb09c9920	New filtered coeff weight by QP values	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	24d487f553	New weights for 12 <= QP <= 42 Trained using MSU ultrafast settings now	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	3e1c6d84b8	Fix issues in fast coeff estimation Allow weight table to start from nonzero QP, and round weights to Q8.8 instead of flooring them	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	5f91bda762	Use newer data for fast coeff cost estimation Same training dataset, but this time only buckets 0...3 were used to approximate the function, no sign/cg width bucket.	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	2abd733199	Use unsigned min() to correctly clip -32768 If a coeff happens to be -32768 (0x8000), its 16-bit abs() is also 0x8000. It should ultimately be clipped to 3, so interpret absolute values as unsigned instead to make that happen.	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	b93b90c0d7	Implement new fast coeff cost estimator in AVX2	2020-10-29 15:20:27 +02:00
Pauli Oikkonen	2f74a112b3	Try first lookup table based fast coeff estimation	2020-10-29 15:20:27 +02:00
Marko Viitanen	2db3a07b14	Prevent cu_sig_model_chroma array from being indexed over the limit	2020-10-13 14:14:57 +03:00
Marko Viitanen	bddfb47a55	Merge remote-tracking branch 'remotes/kvazaar_github/master'	2020-09-25 11:49:11 +03:00
Marko Viitanen	449975b0fb	Fixed cubic filter usage in intra angular modes	2020-09-21 14:58:34 +03:00
Pauli Oikkonen	780da4568a	Exclude 8-bit-only code from 10-bit builds and use uint8_t instead of kvz_pixel for code that assumes 8-bit pixels	2020-09-02 17:46:33 +03:00
Marko Viitanen	574c4d06ee	Fix use of log2_cg_size in coeff coding -> smaller blocks also decoded correctly	2020-08-27 18:26:16 +03:00
Marko Viitanen	20b66c9949	Sync to VTM 8.2 and add separate height to last_sig coding	2020-04-29 08:52:38 +03:00
Jan Beich	1fa69c705d	Rename truncate() from `30ce461d98` to avoid conflict with POSIX version strategies/avx2/dct-avx2.c:55:23: error: static declaration of 'truncate' follows non-static declaration static INLINE __m256i truncate(__m256i v, __m256i debias, int32_t shift) ^ /usr/include/stdio.h:448:6: note: previous declaration is here int truncate(const char *, __off_t); ^	2020-04-22 16:09:42 +00:00
Marko Viitanen	86d76b19a4	Fix intra neighboring block selection and clean some unused code	2020-04-16 14:12:40 +03:00
Ari Lemmetti	f31dddc019	Bypass inverse quantization and inverse transform when trying early skip	2020-04-10 16:02:09 +03:00
Pauli Oikkonen	8617530b13	Use _mm_store_epi64 instead of _mm_cvtsi128_si64 Fix 32-bit builds that tend to lack the cvt intrinsic. Hope it will be optimized to a movq r64, xmm on modern platforms though	2020-04-07 23:51:54 +03:00
Pauli Oikkonen	a82966c0f5	Fix lacking _mm256_cvtss_f32 intrinsic on VS Cast __m256 into __m128 first, the XMM variant of the intrinsic has been around for a long enough time to be supported	2020-04-07 22:38:10 +03:00
Ari Lemmetti	901c25c0c8	Merge branch 'vaq'	2020-04-03 19:51:17 +03:00
Ari Lemmetti	51451be5ef	Handle cases where the number of pixels is not divisible by 32	2020-04-03 19:37:47 +03:00
siivonek	e5267f7706	Fix define for use with Visual Studio.	2020-04-03 15:11:01 +02:00
Pauli Oikkonen	addc1c3ede	Fix warning about potentially unused hsum_8x32b There's a lot of alternative options available, such as making it globally visible with a kvz_ prefix, force inlining it, or anything. This could be good too, hope it won't be compiled at all to translation units where it's not used.	2020-04-02 16:44:22 +03:00
siivonek	566680af7b	Move function hsum to file where it is used to avoid errors.	2020-04-02 14:03:06 +02:00
siivonek	58be514e2a	Fix pipeline error.	2020-04-02 13:50:08 +02:00
Pauli Oikkonen	99889dab15	Fix switch(bool) in picture-avx2.c It passes on GCC but warns on Clang	2020-03-31 15:42:19 +03:00
Jaakko Laitinen	af3d559d8d	Let pu-depth be defined per gop-layer	2020-03-17 17:57:18 +02:00
Pauli Oikkonen	60e7956dc5	Disable inaccurate integer variance calculation for now	2020-03-02 19:18:55 +02:00
Pauli Oikkonen	fc1b91335b	Implement variance calculation in integer math Maybe this is a bit faster than FP, it's not accurate though	2020-03-02 18:17:18 +02:00
Pauli Oikkonen	35c825c75f	Move hsum_8x32b to avx2_common_functions	2020-02-27 17:52:17 +02:00
Pauli Oikkonen	b00ac7d1c4	AVX2 version of buffer variance calculation	2020-02-25 15:57:56 +02:00
Pauli Oikkonen	1bd9c6dd93	Make a strategy out of pixel_var	2020-02-24 19:37:36 +02:00
Ari Lemmetti	3c7dd0752f	Remove the broken "no mov" branch. Causes hash mismatches for example in SlideShow sequence.	2020-02-03 15:26:31 +02:00
RLamm	30d5df40c5	Custom headers for the distributed coding	2020-01-29 15:54:49 +02:00
Pauli Oikkonen	c3d9e97e9f	Fix VS build	2019-12-12 18:34:55 +02:00
Pauli Oikkonen	7f238ca299	Remove debug print functions Whoops	2019-12-12 18:19:31 +02:00
Pauli Oikkonen	eefb5e50b3	De-inline pred_filtered_dc functions, shouldn't make much difference though	2019-12-12 17:30:00 +02:00
Pauli Oikkonen	169314de4f	32x32 filtered DC prediction in AVX2	2019-12-11 18:17:06 +02:00
Pauli Oikkonen	fb2481b7e4	16x16 filtered DC implemented in AVX2	2019-12-10 15:54:50 +02:00
Pauli Oikkonen	da370ea36d	Implement AVX2 8x8 filtered DC algorithm	2019-11-28 14:10:10 +02:00
Pauli Oikkonen	5d9b7019ca	Implement a 4x4 filtered DC pred function	2019-11-26 17:05:54 +02:00
Pauli Oikkonen	f1485ab087	Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?	2019-11-25 15:20:29 +02:00
Marko Viitanen	eb2caf9118	Fix intra angle filter, changed from gauss filter table to run-time calculated 4-tap filter	2019-11-19 15:15:21 +02:00
Pauli Oikkonen	979d66031c	Create a strategy out of intra_pred_filtered_dc	2019-11-19 14:50:31 +02:00
Marko Viitanen	466d8772b0	Apply JVET_P0170_ZERO_POS_SIMPLIFICATION in coeff bypass coding	2019-11-19 14:32:38 +02:00
Pauli Oikkonen	fa4bb86406	Optimize intra_pred_planar_avx2 for 4x4 blocks	2019-11-19 13:39:02 +02:00
Marko Viitanen	17a53230fd	Code cleanup, remove unused arrays and remove tabs	2019-11-18 09:01:23 +02:00
Pauli Oikkonen	4761d228f9	Start to vectorize the 4x4 loop	2019-11-15 17:32:40 +02:00
Pauli Oikkonen	8d45ab4951	Stupidify the 4x4 planar loop for vectorization	2019-11-14 17:14:04 +02:00
Pauli Oikkonen	6d7a4f555c	Also remove 16x16 (A * B^T)^T matrix multiply Can be done using (B * A^T) instead, it's the exact same	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	2c2deb2366	Tidy AVX2 32x32 matrix multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	98ad78b333	Tidy the old AVX2 32x32 matrix multiply It was actually a very good algorithm, just looked messy!	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	4a921cbdb5	Retain data as much in YMM registers as possible This seems to make it a whole lot quicker	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ac4d710e23	Unroll 32x32 matrix multiply, use all regs	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	a58608d0b8	Remove totally unnecessary (A * B^T)^T 32x32 multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	043f53539f	Implement a streamlined matrix-multiply 32x32 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e9da2d851b	Tidy 32x32 fast DCT's helper functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e382339182	Implement fast (butterfly) 32x32 DCT in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	b5962dadac	Tidy indentation in AVX2 16x16 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	36a8f89025	Fine-tune 16x16 AVX2 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ca9409de2b	Implement 16x16 DCT as butterfly algorithm in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7c69a26717	Use aligned loads and stores for AVX2 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e9c65dca6	Align DCT matrices and temp transform buffers	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	148a150522	Align DCT source and dest blocks to cache line	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e60bbf6a6	Slightly tune 16x16 forward DCT Use an array of __m256i's to store temporary value, essentially letting the compiler enforce alignment and use aligned loads and stores.	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	c0cc0e8a75	Optimize 16x16 multiply by only slicing right mat once	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e463d27f22	Implement streamlined generic 16x16 matrix multiply It can't be this fast for real, can it?	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	beb85ce9d6	Reorder parameters for 8x8 matrix multiplies	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	292af62256	Implement tailored 16x16 forward DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	30ce461d98	Redo 4x4 matrix multiplication	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	07970ea82f	Streamline by-the-book 8x8 matrix multiplication Also chop up the forward transform into two tailored multiply functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7ec7ab3361	Implement a tailored AVX2 8x8 DCT	2019-10-28 16:19:42 +02:00
pkubaj	1d7fcf4227	Fix build on powerpc64 with LLVM	2019-09-12 15:05:00 +02:00
Pauli Oikkonen	99597b828a	Work around the ancient Win32 calling convention hassle See if this'll work now	2019-09-06 13:14:42 +03:00
Pauli Oikkonen	c5ca18950c	Revert "Revert to `6924d90052` due to broken visual studio build" This reverts commit `1dd0619bd7`.	2019-09-05 18:21:55 +03:00
Pauli Oikkonen	55529decd5	Implement _mm256_insert_epi32 and extract pseudo-ops Visual Studio headers apparently lack these guys	2019-09-05 18:20:52 +03:00
Ari Lemmetti	557bcbc6aa	Make luma or chroma only inter "recon" or predict possible	2019-09-02 17:15:28 +03:00
RLamm	60be6d411c	Intra filtering fixed at least for luma. All intra modes output valid luma (hashes match), but chroma is still broken.	2019-08-30 16:14:00 +03:00
Marko Viitanen	cb0d7c340a	Use the new PDPC filtering in angular intra	2019-08-23 14:44:41 +03:00
Marko Viitanen	5bebb18943	Change intra filtering according to VTM6	2019-08-23 08:56:35 +03:00
Marko Viitanen	a16efe6b52	Merge remote-tracking branch 'remotes/github_kvazaar/master' # Conflicts: # build/kvazaar_VS2013.sln # build/kvazaar_VS2015.sln # build/kvazaar_VS2017.sln # build/kvazaar_cli/kvazaar_cli.vcxproj # build/kvazaar_lib/kvazaar_lib.vcxproj # build/kvazaar_tests/kvazaar_tests.vcxproj # src/encode_coding_tree.c # src/encode_coding_tree.h # src/encoder_state-bitstream.c # src/inter.c # src/strategies/avx2/quant-avx2.c	2019-08-22 15:12:01 +03:00
Ari Lemmetti	1dd0619bd7	Revert to `6924d90052` due to broken visual studio build	2019-08-08 15:15:34 +03:00
Pauli Oikkonen	2852baa673	Separate sign3_diff_epu8 from calc_eo_cat Just to keep things simple, clear and obvious	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	a858e7dd4b	Combine duplicate code into inline functions	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	de0e97f711	Take 8/16/24b loads and stores into separate functions	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	10979f58fe	Tidy up code	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	9cc11976c0	Combine the delta accumulation from edge and band ddistortion into shared func This won't reduce object size, but there'll be less duplicate code	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	55d877bd66	Vectorize sao_edge_ddistortion	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	aef0f301d3	Fix function signatures Mark anything intended as read-only to be const, and fix alignment	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	997fd369b3	Redo calc_sao_edge_dir_avx2 Do it wider, 32 pixels at once!	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	db1e475e02	Use i32 instead of i8 for x/y offsets Doesn't matter too much, because this number isn't used in SIMD computation, only as a memory reference offset.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	12de466ef5	Reimplement non-band SAO color reconstruction in AVX2 Streamline things to work on 32 pixels at once instead of 8	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	e8bff99329	Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction Vectorize it all, hope this helps with perf	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	7b5dffa855	Implement calc_sao_offset_array in AVX2 To be efficient, the AVX2 color reconstruction algorithm will need offsets in byte, not dword, arrays. This is completely specific to 8-bit pixels and the function signature is fundamentally distinct from the generic algorithm, so it's better to not strategize SAO offset array calculation.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	08881f5e9b	(TEMP) (TODO) (whatever) Avoid compiler warnings I want the CI to not crash on its -Wall -Werror, but instead to actually build the thing and report me about actual memory errors etc	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	c18adc5ee0	Redo sao_band_ddistortion_avx2 Avoid branching and do the entire thing on 32 pixels at once in YMMs. Also make the sao_bands function parameter const.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	1bb9a079a8	Fix indentation	2019-08-07 16:35:24 +03:00
Reima Hyvönen	7bc959c7c5	3 sao functions are now working	2019-08-07 16:35:24 +03:00
Reima Hyvönen	0e0f2d3490	made to clear sum vector after it has been set to memory	2019-08-07 16:35:24 +03:00
Reima Hyvönen	f146de7acb	removed some variables to prevent memory losses	2019-08-07 16:35:24 +03:00
Reima Hyvönen	247c3a7a71	conversed gined to unsigned int	2019-08-07 16:35:24 +03:00
Reima Hyvönen	ac5c216974	Some more memory error preventing to sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	3fb1cbca35	more editing sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	afbb6fb960	some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures	2019-08-07 16:35:24 +03:00
Reima Hyvönen	3496a57f7a	Edited sao_edge_ddistortion_avx2 to avoid memory overflow	2019-08-07 16:35:24 +03:00
Reima Hyvönen	267ba1d6ce	Modified sao_band_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	e70663b245	added some sub commands to avoid memory read errors	2019-08-07 16:35:24 +03:00
Reima Hyvönen	59dfb4570c	Converted some loads to load int8_t instead ints	2019-08-07 16:35:24 +03:00
Reima Hyvönen	8b253209a8	Found false address load from calc_sao_edge_dir. Should now work like generic	2019-08-07 16:35:24 +03:00
Reima Hyvönen	50e0a47b7a	Took away __restrict	2019-08-07 16:35:24 +03:00
Reima Hyvönen	8a39eb674e	Removed c-variable from calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	bc0a36830d	Clerified some 6 pixel loads	2019-08-07 16:35:24 +03:00
Reima Hyvönen	1a8b211e05	Added break to line 170	2019-08-07 16:35:24 +03:00
Reima Hyvönen	d05e750ebe	Added some switches to prevent segmentation fault from reading	2019-08-07 16:35:24 +03:00
Reima Hyvönen	203580047d	Defined some AVX functions	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c884c738b1	Updated some commands to match the standard	2019-08-07 16:35:24 +03:00
Reima Hyvönen	b412ed2f59	Removed some setr and used loads calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c6cc063534	converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract	2019-08-07 16:35:24 +03:00
Reima Hyvönen	47ac109b10	optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND	2019-08-07 16:35:24 +03:00
Reima Hyvönen	96dc60a1ed	first working optimation	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c148aff9fb	Some optimation done to function sao_reconstruct_color_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	bf16ba6cc4	Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	79dc39a676	Some editing for sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	06ee52924e	some reconst done to calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00

1 2 3 4 5 ...

625 commits