hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 18:34:06 +00:00

Author	SHA1	Message	Date
Pauli Oikkonen	35c825c75f	Move hsum_8x32b to avx2_common_functions	2020-02-27 17:52:17 +02:00
Pauli Oikkonen	b00ac7d1c4	AVX2 version of buffer variance calculation	2020-02-25 15:57:56 +02:00
siivonek	a380e43bda	Add chroma channels to variance calculation.	2020-02-24 19:54:34 +02:00
Pauli Oikkonen	1bd9c6dd93	Make a strategy out of pixel_var	2020-02-24 19:37:36 +02:00
Pauli Oikkonen	86ebf366e1	fix typo	2020-02-24 18:18:10 +02:00
Joose Sainio	f81de41775	Merge branch 'master' into rc-intra	2020-02-24 15:30:57 +02:00
siivonek	5688bcd646	Merge branch 'master' into vaq	2020-02-21 17:11:10 +02:00
siivonek	908ecb1767	Add rounding to aq offsets. Fix typo	2020-02-21 13:51:43 +02:00
Ari Lemmetti	1dfc69b42e	Consider merge index bits in merge analysis and early skip	2020-02-20 09:43:58 +02:00
Joose Sainio	7deb22c8e8	Merge branch 'master' into rc-intra	2020-02-19 15:01:04 +02:00
Kari Siivonen (TAU)	c972ca9067	Add assert to check if deltaQP out of bounds. Clip adaptive QP to [-13, 12].	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	f07990794f	Fix error in vaq pixel blit range calculation	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	57ed40c263	Fix application of aq offset	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	be2f420d61	Change: vaq requires parameter. Parameter defines vaq strength ex. 15 == 1.5	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	bf1b2c1e22	Add define for vaq strength parameter	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	150559a7e8	Fix bugs. Enable set_qp_in_cu when using vaq	2020-02-18 13:20:26 +02:00
Kari Siivonen (TAU)	c8c71274ee	Change tabs to spaces.	2020-02-18 13:20:26 +02:00
siivonek	888382953d	Implement calculation of vaq values. Values not used yet.	2020-02-18 13:20:25 +02:00
siivonek	ad40a88c09	Add no-vaq option to vaq	2020-02-18 13:20:25 +02:00
siivonek	09f0a1c52e	Fix typo in comment	2020-02-18 13:20:25 +02:00
siivonek	84fb3fd7d1	aq: Add --vaq commandline option	2020-02-18 13:20:25 +02:00
Joose Sainio	2a98f5db1e	fix intra-bits for lp-gop	2020-02-18 10:38:29 +02:00
Ari Lemmetti	71d9327f62	Further improve fast bipred	2020-02-17 20:32:52 +02:00
Ari Lemmetti	80c26870d5	Update docs	2020-02-15 23:29:18 +02:00
Ari Lemmetti	ebb183cc01	Add option to make intra QP offset configurable	2020-02-15 22:54:48 +02:00
Ari Lemmetti	be3e08d6db	Add gop.h to Makefile	2020-02-15 22:54:47 +02:00
Ari Lemmetti	1354acd358	Prevent negative values being written to SPS with --gop=0	2020-02-15 22:54:47 +02:00
Ari Lemmetti	fe4869916c	Disable GOP and intra qp offset for all-intra coding automatically	2020-02-15 22:54:46 +02:00
Ari Lemmetti	9849fb7c77	Enable experimental rate control for GOP 16	2020-02-15 22:54:46 +02:00
Ari Lemmetti	a0a22dec8a	Remove deprecated / unused lambda adjustments	2020-02-15 22:54:46 +02:00
Arttu Ylä-Outinen	829a70e6a7	Copy lowdelay GOP definition from HM	2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen	28f99c0b87	Change definition of 8-GOP to match HM	2020-02-15 22:36:58 +02:00
Arttu Ylä-Outinen	636fa8fbdd	Fix maximum decoded picture buffer size	2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen	ebd5156db5	Add definition for random access GOP of length 16	2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen	6653f06dd0	Only compute GOP layer weights when RC is enabled	2020-02-15 22:36:57 +02:00
Arttu Ylä-Outinen	c8fff1e0d6	Use a larger number of bits for POC lsb when needed Changes the number of bits used for coding the least significant bits of the POC based on the GOP size.	2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen	d757a832c2	Change GOP QP offset handling to match HM Adds fields qp_model_scale and qp_model_offset to kvz_gop_config and intra_qp_offset to kvz_config.	2020-02-15 22:36:56 +02:00
Arttu Ylä-Outinen	f37dcd5879	Move GOP definition to a separate file Moves definition of the 8-GOP from cfg.c to gop.h.	2020-02-15 22:36:55 +02:00
Ari Lemmetti	6e1007a3e7	Get rid of LAMBA! (Commit #3000 )	2020-02-15 22:32:52 +02:00
Ari Lemmetti	0c02e71b43	Remove minor error from readme	2020-02-15 22:29:08 +02:00
Joose Sainio	e90d3141a2	Merge branch 'master' into rc-intra	2020-02-05 11:06:56 +02:00
Ari Lemmetti	9a0236bb4e	Add option 'zero-coeff-rdo'	2020-02-04 21:26:29 +02:00
Ari Lemmetti	886ff36d12	Initial implementation of fast bipred.	2020-02-04 15:46:23 +02:00
Ari Lemmetti	3c7dd0752f	Remove the broken "no mov" branch. Causes hash mismatches for example in SlideShow sequence.	2020-02-03 15:26:31 +02:00
RLamm	bf8941ddb8	Added comment about partial-coding usage	2020-01-31 16:19:48 +02:00
RLamm	b8488ab48d	Changed "partial-coding" variables to uint32_t	2020-01-31 16:02:29 +02:00
RLamm	76e3249754	Changed parameter "slicer" to "partial-coding" to avoid confusion.	2020-01-31 14:22:32 +02:00
RLamm	30d5df40c5	Custom headers for the distributed coding	2020-01-29 15:54:49 +02:00
Joose Sainio	54571529a4	Fix accessing previous frame that didn't exist	2020-01-17 10:48:35 +02:00
Joose Sainio	5c671d20e1	Use the new clipping only in situations where it actually helps	2020-01-17 09:08:21 +02:00
Joose Sainio	3c34d7c863	Fix qp estimation and checking of previous frames that dont exist	2020-01-15 09:32:04 +02:00
Joose Sainio	1a35c22a52	Change clipping of lambda and qp for ctus on OBA rc instead of clipping qp and lambda to the value of last value from the state clip to previous frame with same layer and if such frame doesn't exist, clip to previous frame	2020-01-14 14:46:05 +02:00
Pauli Oikkonen	c3d9e97e9f	Fix VS build	2019-12-12 18:34:55 +02:00
Pauli Oikkonen	7f238ca299	Remove debug print functions Whoops	2019-12-12 18:19:31 +02:00
Pauli Oikkonen	eefb5e50b3	De-inline pred_filtered_dc functions, shouldn't make much difference though	2019-12-12 17:30:00 +02:00
Pauli Oikkonen	169314de4f	32x32 filtered DC prediction in AVX2	2019-12-11 18:17:06 +02:00
Pauli Oikkonen	fb2481b7e4	16x16 filtered DC implemented in AVX2	2019-12-10 15:54:50 +02:00
Joose Sainio	b78aa7b272	save c and k to frame	2019-12-06 10:52:54 +02:00
Joose Sainio	5b10e5fb7e	parameterize the clipping option	2019-12-06 09:51:04 +02:00
Pauli Oikkonen	da370ea36d	Implement AVX2 8x8 filtered DC algorithm	2019-11-28 14:10:10 +02:00
Pauli Oikkonen	5d9b7019ca	Implement a 4x4 filtered DC pred function	2019-11-26 17:05:54 +02:00
Joose Sainio	ca0060cbba	try the original clipping	2019-11-26 15:13:04 +02:00
Pauli Oikkonen	f1485ab087	Start doing an arbitrary size filtered DC pred - maybe easier to just create separate functions for fixed block sizes?	2019-11-25 15:20:29 +02:00
Joose Sainio	ab2fded8af	Update threadwrapper to enable pthread_rwlock_t	2019-11-21 13:38:40 +02:00
Joose Sainio	eb78aead1f	Fix additional potential data races	2019-11-21 11:03:12 +02:00
Joose Sainio	35d7e0d88b	Fix data race	2019-11-21 10:25:04 +02:00
Marko Viitanen	94d89f03c7	Added cfg variable intra_smoothing_disabled and some cleanup	2019-11-20 08:38:33 +02:00
Marko Viitanen	eb2caf9118	Fix intra angle filter, changed from gauss filter table to run-time calculated 4-tap filter	2019-11-19 15:15:21 +02:00
Pauli Oikkonen	979d66031c	Create a strategy out of intra_pred_filtered_dc	2019-11-19 14:50:31 +02:00
Marko Viitanen	466d8772b0	Apply JVET_P0170_ZERO_POS_SIMPLIFICATION in coeff bypass coding	2019-11-19 14:32:38 +02:00
Joose Sainio	0e8815a3d8	test clipping qp to previous frame instead of previous ctus	2019-11-19 14:32:31 +02:00
Joose Sainio	ddb4e5a131	move the intra bit calculation so that it is used also with lambda rc	2019-11-19 14:16:48 +02:00
Joose Sainio	a07833f3e6	check that mallocs in rc initialization were successful only call kvz_update_after_picture when using the OBA rc	2019-11-19 13:59:44 +02:00
Joose Sainio	50d410a316	re-enable static qp encoding and lambda rc	2019-11-19 13:45:58 +02:00
Pauli Oikkonen	fa4bb86406	Optimize intra_pred_planar_avx2 for 4x4 blocks	2019-11-19 13:39:02 +02:00
Marko Viitanen	3df2642b03	Fix qt cbf context init value	2019-11-19 13:27:36 +02:00
Joose Sainio	57e5615ece	Fix incorrect intra rc calculation skipping	2019-11-19 13:25:31 +02:00
Joose Sainio	6cc3bcd87e	Command line parameters for oba rc and implementation of the usage of the intra parameter	2019-11-19 09:29:06 +02:00
Joose Sainio	eb73548af5	Encode first frame completely before starting others to enable owf	2019-11-18 09:51:37 +02:00
Marko Viitanen	17a53230fd	Code cleanup, remove unused arrays and remove tabs	2019-11-18 09:01:23 +02:00
Pauli Oikkonen	4761d228f9	Start to vectorize the 4x4 loop	2019-11-15 17:32:40 +02:00
Pauli Oikkonen	8d45ab4951	Stupidify the 4x4 planar loop for vectorization	2019-11-14 17:14:04 +02:00
Marko Viitanen	91528f3292	Update contexts	2019-11-14 13:46:51 +02:00
Marko Viitanen	b309ed90be	Fix NAL packet and missing fields in SPS	2019-11-14 09:21:11 +02:00
Marko Viitanen	74514981a9	Fixed PPS, SPS and slice headers and NAL unit types	2019-11-13 15:59:36 +02:00
Joose Sainio	c759c138ed	Prepare the rc data structure to be shared among all frame encoders	2019-11-13 11:56:25 +02:00
Joose Sainio	cdb7c851a4	Fix weight calculation	2019-11-13 08:55:31 +02:00
Joose Sainio	b9b01f8036	WPP with threading	2019-11-12 12:12:57 +02:00
Joose Sainio	615973adca	should enable threading with wpp when owf is not used	2019-11-12 09:03:00 +02:00
Pauli Oikkonen	6f13f6525c	Merge branch 'new_prints'	2019-11-07 17:04:21 +02:00
Joose Sainio	d353f7dd1a	Disable debug prints, fix multiple bugs in the calculation	2019-11-07 15:08:57 +02:00
mercat	57e8c3ebc2	Merge branch 'ML-cplx_red_ICIP'	2019-11-07 13:25:47 +02:00
Pauli Oikkonen	558f0ec401	Mbps, not mbps	2019-11-05 18:06:00 +02:00
Pauli Oikkonen	2edf533925	Tidy the end report printing Also fix a bug with non-integer target FPS	2019-11-05 17:20:00 +02:00
Joose Sainio	408fd4ccb6	Fix lambda and qp calcualtion for intra frames also fixes a bug with selecting the clip neighbor lambda and clip neighbor qp selection for inter frames	2019-11-05 10:51:39 +02:00
Pauli Oikkonen	c7313ce567	Store AVG QP information in encmain	2019-11-04 17:08:07 +02:00
Reima Hyvönen	80575c59bf	Some updates done to get right bitrate and avg QP	2019-10-31 15:56:24 +02:00
Reima Hyvönen	252bab8820	Added prints to bitrate and AVG QP	2019-10-31 15:56:24 +02:00
Pauli Oikkonen	6d7a4f555c	Also remove 16x16 (A * B^T)^T matrix multiply Can be done using (B * A^T) instead, it's the exact same	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	2c2deb2366	Tidy AVX2 32x32 matrix multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	98ad78b333	Tidy the old AVX2 32x32 matrix multiply It was actually a very good algorithm, just looked messy!	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	4a921cbdb5	Retain data as much in YMM registers as possible This seems to make it a whole lot quicker	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ac4d710e23	Unroll 32x32 matrix multiply, use all regs	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	a58608d0b8	Remove totally unnecessary (A * B^T)^T 32x32 multiply	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	043f53539f	Implement a streamlined matrix-multiply 32x32 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e9da2d851b	Tidy 32x32 fast DCT's helper functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e382339182	Implement fast (butterfly) 32x32 DCT in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	b5962dadac	Tidy indentation in AVX2 16x16 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	36a8f89025	Fine-tune 16x16 AVX2 iDCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	ca9409de2b	Implement 16x16 DCT as butterfly algorithm in AVX2	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7c69a26717	Use aligned loads and stores for AVX2 DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e9c65dca6	Align DCT matrices and temp transform buffers	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	148a150522	Align DCT source and dest blocks to cache line	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	8e60bbf6a6	Slightly tune 16x16 forward DCT Use an array of __m256i's to store temporary value, essentially letting the compiler enforce alignment and use aligned loads and stores.	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	c0cc0e8a75	Optimize 16x16 multiply by only slicing right mat once	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	e463d27f22	Implement streamlined generic 16x16 matrix multiply It can't be this fast for real, can it?	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	beb85ce9d6	Reorder parameters for 8x8 matrix multiplies	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	292af62256	Implement tailored 16x16 forward DCT	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	30ce461d98	Redo 4x4 matrix multiplication	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	07970ea82f	Streamline by-the-book 8x8 matrix multiplication Also chop up the forward transform into two tailored multiply functions	2019-10-28 16:19:42 +02:00
Pauli Oikkonen	7ec7ab3361	Implement a tailored AVX2 8x8 DCT	2019-10-28 16:19:42 +02:00
Joose Sainio	372934c7db	Fix division by zero	2019-10-10 16:35:56 +03:00
Joose Sainio	9bdfdeaf5c	Rest of the owl	2019-10-09 15:48:58 +03:00
Joose Sainio	1ba8525faf	WIP	2019-10-09 10:35:07 +03:00
Joose Sainio	19496d2692	?	2019-10-03 14:50:11 +03:00
Joose Sainio	4b111e339e	fix couple of bugs in the implementation, bit calculation seems still bit off	2019-10-01 15:08:39 +03:00
Joose Sainio	84615e406a	fix compiler warnings	2019-09-27 14:20:08 +03:00
Joose Sainio	14b7a75713	Call the new functions and fix bugs	2019-09-27 14:14:24 +03:00
Joose Sainio	ef74bfb182	unify naming	2019-09-27 10:16:21 +03:00
Joose Sainio	e36f481bda	qp calculation for frame	2019-09-27 09:05:40 +03:00
Joose Sainio	47019ca1cd	intra ck update	2019-09-26 16:04:53 +03:00
Joose Sainio	7c8f4da7cb	Update c and k except after first intra	2019-09-26 13:09:28 +03:00
Joose Sainio	0577d481c1	CTU level code	2019-09-25 12:12:21 +03:00
pkubaj	1d7fcf4227	Fix build on powerpc64 with LLVM	2019-09-12 15:05:00 +02:00
mercat	0de567bfa4	Fixe memory leak	2019-09-12 09:45:32 +03:00
mercat	fa116de619	Add static	2019-09-11 16:18:12 +03:00
mercat	b8753a9293	Fucking INLINE fixed	2019-09-11 16:12:07 +03:00
mercat	b855144e68	INLINE fixe	2019-09-11 16:12:07 +03:00
mercat	694337b803	Add const and more const	2019-09-11 16:12:07 +03:00
mercat	21c07638ed	Remove const into kvz_init_constraint.	2019-09-11 16:12:06 +03:00
mercat	2bca507abe	Clean version of machine learning constraint code. (ICIP paper)	2019-09-11 16:12:06 +03:00
Alexandre Mercat	0f4b7be6ee	First version of ML ICIP code for master	2019-09-11 16:12:06 +03:00
Pauli Oikkonen	99597b828a	Work around the ancient Win32 calling convention hassle See if this'll work now	2019-09-06 13:14:42 +03:00
Pauli Oikkonen	c5ca18950c	Revert "Revert to `6924d90052` due to broken visual studio build" This reverts commit `1dd0619bd7`.	2019-09-05 18:21:55 +03:00
Pauli Oikkonen	55529decd5	Implement _mm256_insert_epi32 and extract pseudo-ops Visual Studio headers apparently lack these guys	2019-09-05 18:20:52 +03:00
Marko Viitanen	28dc4fa2ed	Fix intra MPM selection	2019-09-05 09:39:13 +03:00
Ari Lemmetti	147378e1f9	Prevent 8x4 and 4x8 bipred in merge analysis	2019-09-03 16:32:50 +03:00
Ari Lemmetti	ef1fdbf259	Separate prediction of single PU/PB from CU/CB	2019-09-03 16:32:50 +03:00
Joose Sainio	7d2737bdf6	WIP picture lambda calculation	2019-09-03 11:03:35 +03:00
Ari Lemmetti	3bc510712f	Enable merge analysis for smp and amp	2019-09-02 17:31:51 +03:00
Ari Lemmetti	557bcbc6aa	Make luma or chroma only inter "recon" or predict possible	2019-09-02 17:15:28 +03:00
Marko Viitanen	6d5e20ca13	Header changes to match VTM 6.1	2019-09-02 09:42:35 +03:00
RLamm	60be6d411c	Intra filtering fixed at least for luma. All intra modes output valid luma (hashes match), but chroma is still broken.	2019-08-30 16:14:00 +03:00
RLamm	83ac39094a	Use new PDPC filtering for planar and DC modes	2019-08-29 12:51:34 +03:00
Joose Sainio	131c04f65c	Fix incorrect weight for intra frame	2019-08-29 12:01:13 +03:00
Joose Sainio	8f96678d13	Fix issue with intra frames being part of gop when they shouldn't	2019-08-29 09:28:10 +03:00
Ari Lemmetti	aa8ab195d1	Compare rough cost of the best merge mode against AMVP to make mode decision	2019-08-26 22:49:09 +03:00
Ari Lemmetti	8f866ff83a	Use correct index	2019-08-26 20:10:10 +03:00
Ari Lemmetti	2343958a14	Fix transform split for small luma blocks	2019-08-24 21:50:17 +03:00
Ari Lemmetti	800fc8644d	Reset CBFs because CBFs might have been set earlier for depth earlier.	2019-08-24 21:49:33 +03:00
Ari Lemmetti	a80de22bc7	Add only different candidates to the list	2019-08-24 21:49:33 +03:00
Ari Lemmetti	45c7961412	Remove tr depth fill. It should not be needed.	2019-08-24 21:49:32 +03:00
Ari Lemmetti	ff8711aaab	Add missing logic to add valid indices to list	2019-08-24 21:49:29 +03:00
Marko Viitanen	cb0d7c340a	Use the new PDPC filtering in angular intra	2019-08-23 14:44:41 +03:00
Marko Viitanen	5bebb18943	Change intra filtering according to VTM6	2019-08-23 08:56:35 +03:00
Marko Viitanen	a16efe6b52	Merge remote-tracking branch 'remotes/github_kvazaar/master' # Conflicts: # build/kvazaar_VS2013.sln # build/kvazaar_VS2015.sln # build/kvazaar_VS2017.sln # build/kvazaar_cli/kvazaar_cli.vcxproj # build/kvazaar_lib/kvazaar_lib.vcxproj # build/kvazaar_tests/kvazaar_tests.vcxproj # src/encode_coding_tree.c # src/encode_coding_tree.h # src/encoder_state-bitstream.c # src/inter.c # src/strategies/avx2/quant-avx2.c	2019-08-22 15:12:01 +03:00
Marko Viitanen	01ea762c1f	Fix coeff coding ad remove bdpcm flag -> CABAC bits match with VTM 6.0	2019-08-22 14:33:42 +03:00
Marko Viitanen	210af8adbe	Remove joint_cb_cr flag and fix split_flag context selection	2019-08-22 11:23:24 +03:00
Marko Viitanen	c713d31c93	Fix sig_coeff context selection	2019-08-22 10:57:50 +03:00
Marko Viitanen	48b8898e53	Fix CBF context init and use	2019-08-22 10:44:47 +03:00
Marko Viitanen	db94ec1a84	Rename intra_mode_model -> intra_luma_mpm_flag_model and update the contexts	2019-08-19 15:17:25 +03:00
Marko Viitanen	1c6ffc0a7e	Fix wrong variable types in context init	2019-08-19 14:33:55 +03:00
Marko Viitanen	cd6be15e10	Fix context init to match VTM6.0	2019-08-19 13:57:31 +03:00
Marko Viitanen	3de198d2db	Sync contexts with VTM6.0	2019-08-19 09:39:59 +03:00
Marko Viitanen	e644b03615	Fix headers to match VTM6.0rc1	2019-08-16 15:33:20 +03:00
Ari Lemmetti	1dd0619bd7	Revert to `6924d90052` due to broken visual studio build	2019-08-08 15:15:34 +03:00
Pauli Oikkonen	2852baa673	Separate sign3_diff_epu8 from calc_eo_cat Just to keep things simple, clear and obvious	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	17947b79ee	Add sao_shared_generics.h in Makefile.am	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	a8dd6ce351	Add a note about having implemented a separate AVX2 version of SAO offset array calculation	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	a858e7dd4b	Combine duplicate code into inline functions	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	de0e97f711	Take 8/16/24b loads and stores into separate functions	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	10979f58fe	Tidy up code	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	9cc11976c0	Combine the delta accumulation from edge and band ddistortion into shared func This won't reduce object size, but there'll be less duplicate code	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	55d877bd66	Vectorize sao_edge_ddistortion	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	aef0f301d3	Fix function signatures Mark anything intended as read-only to be const, and fix alignment	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	997fd369b3	Redo calc_sao_edge_dir_avx2 Do it wider, 32 pixels at once!	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	db1e475e02	Use i32 instead of i8 for x/y offsets Doesn't matter too much, because this number isn't used in SIMD computation, only as a memory reference offset.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	12de466ef5	Reimplement non-band SAO color reconstruction in AVX2 Streamline things to work on 32 pixels at once instead of 8	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	e8bff99329	Redo the SAO_TYPE_BAND subsection of AVX2 SAO color reconstruction Vectorize it all, hope this helps with perf	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	7b5dffa855	Implement calc_sao_offset_array in AVX2 To be efficient, the AVX2 color reconstruction algorithm will need offsets in byte, not dword, arrays. This is completely specific to 8-bit pixels and the function signature is fundamentally distinct from the generic algorithm, so it's better to not strategize SAO offset array calculation.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	29563b7039	Make kvz_calc_sao_offset_array more obvious Name temporary values from array lookups etc that are referred multiple times to, to make the behavior of the mechanism more transparent. Define all the constant values at the beginning of the function and declare as const.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	08881f5e9b	(TEMP) (TODO) (whatever) Avoid compiler warnings I want the CI to not crash on its -Wall -Werror, but instead to actually build the thing and report me about actual memory errors etc	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	c18adc5ee0	Redo sao_band_ddistortion_avx2 Avoid branching and do the entire thing on 32 pixels at once in YMMs. Also make the sao_bands function parameter const.	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	2827c3e3ab	Make calc_sao_bands less opaque	2019-08-07 16:35:24 +03:00
Pauli Oikkonen	1bb9a079a8	Fix indentation	2019-08-07 16:35:24 +03:00
Reima Hyvönen	7bc959c7c5	3 sao functions are now working	2019-08-07 16:35:24 +03:00
Reima Hyvönen	0e0f2d3490	made to clear sum vector after it has been set to memory	2019-08-07 16:35:24 +03:00
Reima Hyvönen	f146de7acb	removed some variables to prevent memory losses	2019-08-07 16:35:24 +03:00
Reima Hyvönen	247c3a7a71	conversed gined to unsigned int	2019-08-07 16:35:24 +03:00
Reima Hyvönen	ac5c216974	Some more memory error preventing to sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	3fb1cbca35	more editing sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	afbb6fb960	some more modifications to sao_edge_ddistortion_avx2 to prevent memory failures	2019-08-07 16:35:24 +03:00
Reima Hyvönen	3496a57f7a	Edited sao_edge_ddistortion_avx2 to avoid memory overflow	2019-08-07 16:35:24 +03:00
Reima Hyvönen	267ba1d6ce	Modified sao_band_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	e70663b245	added some sub commands to avoid memory read errors	2019-08-07 16:35:24 +03:00
Reima Hyvönen	59dfb4570c	Converted some loads to load int8_t instead ints	2019-08-07 16:35:24 +03:00
Reima Hyvönen	8b253209a8	Found false address load from calc_sao_edge_dir. Should now work like generic	2019-08-07 16:35:24 +03:00
Reima Hyvönen	50e0a47b7a	Took away __restrict	2019-08-07 16:35:24 +03:00
Reima Hyvönen	8a39eb674e	Removed c-variable from calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	bc0a36830d	Clerified some 6 pixel loads	2019-08-07 16:35:24 +03:00
Reima Hyvönen	1a8b211e05	Added break to line 170	2019-08-07 16:35:24 +03:00
Reima Hyvönen	d05e750ebe	Added some switches to prevent segmentation fault from reading	2019-08-07 16:35:24 +03:00
Reima Hyvönen	203580047d	Defined some AVX functions	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c884c738b1	Updated some commands to match the standard	2019-08-07 16:35:24 +03:00
Reima Hyvönen	b412ed2f59	Removed some setr and used loads calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c6cc063534	converted some hadd operations at calc_sao_edge_dir_avx2 to cast and extract	2019-08-07 16:35:24 +03:00
Reima Hyvönen	47ac109b10	optimated some sao_reconstruct_color_avx2 when sao->type == SAO_TYPE_BAND	2019-08-07 16:35:24 +03:00
Reima Hyvönen	96dc60a1ed	first working optimation	2019-08-07 16:35:24 +03:00
Reima Hyvönen	c148aff9fb	Some optimation done to function sao_reconstruct_color_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	bf16ba6cc4	Remade sao_edge_ddistortion_avx2 and calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	79dc39a676	Some editing for sao_edge_ddistortion_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	06ee52924e	some reconst done to calc_sao_edge_dir_avx2	2019-08-07 16:35:24 +03:00
Reima Hyvönen	5fbc65d823	reconst optimation doesn't work yet	2019-08-07 16:35:24 +03:00
Reima Hyvönen	d29f834a69	Remove useless function	2019-08-07 16:35:24 +03:00
Reima Hyvönen	a232a12160	calc_sao_edge_dir_avx2 updated	2019-08-07 16:35:24 +03:00
Reima Hyvönen	b1febc02a5	sao_edge_ddistortion_avx2 now working proberly	2019-08-07 16:35:24 +03:00
Reima Hyvönen	cd6092a1ec	Still too much bits, looking for where they appear	2019-08-07 16:35:24 +03:00
Reima Hyvönen	7853be8eeb	Incomple optimation	2019-08-07 16:35:24 +03:00
Marko Viitanen	dfa5621024	Intrapred cleanup	2019-07-16 14:23:10 +03:00
Ari Lemmetti	40609aa865	Add missing headers to Makefile.am	2019-07-12 19:15:51 +03:00
Ari Lemmetti	5db3a78499	Bump versions for release 1.3	2019-07-09 22:09:32 +03:00
Ari Lemmetti	d513ab1999	Add missing newline	2019-07-09 21:06:05 +03:00
Ari Lemmetti	4967072625	Do not bypass search on skip cu if early_skip is not enabled	2019-07-09 20:20:12 +03:00
Ari Lemmetti	b20992a9f3	Rename functions more descriptive	2019-07-09 20:20:11 +03:00
Ari Lemmetti	a348a0ec23	Fix transform depth in early skip	2019-07-09 20:05:48 +03:00
Pauli Oikkonen	8d48bee180	Tidy fast coeff cost code	2019-07-09 18:01:54 +03:00
Pauli Oikkonen	201a43b08e	Clean up the RD-estimation code	2019-07-09 18:01:54 +03:00
Pauli Oikkonen	b111df5073	Create preliminary version of improved cost estimator	2019-07-09 18:01:54 +03:00
Ari Lemmetti	be08a87d94	Add missing parameter max-merge to the help message	2019-07-09 16:28:46 +03:00
Ari Lemmetti	d0bb9b4a6d	Add parameter max-merge to presets	2019-07-09 16:26:03 +03:00
Ari Lemmetti	4097331fd6	Early skip	2019-07-09 15:59:31 +03:00
Marko Viitanen	10d850e98a	Use index_offset in intra angular and change the offset to width+1	2019-07-08 14:23:19 +03:00
Marko Viitanen	3d1fa2a9cf	Fixing angular intra prediction reference pixels	2019-07-08 14:00:02 +03:00
Marko Viitanen	0656c54cab	Fix some problems with reference pixels in angular intra prediction kvz_angular_pred_generic()	2019-07-05 15:54:51 +03:00
Marko Viitanen	89ca2d4ba1	Use correct type for modedisp2sampledisp array	2019-07-05 14:12:10 +03:00
Marko Viitanen	2e8a0d08f9	Fix mvp_idx_model initialization and use	2019-07-05 14:11:29 +03:00
Joose Sainio	977e885ea2	Fix issue with gop=0 introduced in `1c36f68d0c`	2019-07-05 12:57:27 +03:00
Marko Viitanen	c6217e236f	Enable 4-tap filtering for the intra angular	2019-07-04 16:26:10 +03:00
Marko Viitanen	cda6d951c0	Change DCT arrays back to 8-bit -> some frames are now correct	2019-07-04 15:59:10 +03:00
Marko Viitanen	8280bd3217	Add channel info to angular_pred and fix the displacement tables. Also includes 4-tap intra filtering code commented out	2019-07-04 09:35:47 +03:00
Marko Viitanen	5e4369d6b0	Fix the kvz_cabac_encode_aligned_bins_ep function -> cabac coding now correct	2019-07-03 15:55:52 +03:00
Marko Viitanen	3fad4b0a98	Disable kvz_cabac_encode_aligned_bins_ep for now and add a ToDo message	2019-07-03 15:44:35 +03:00
Sami Ahovainio	ce1e67cc3a	Modified header flags to match VTM commit b9080ff45bec368c44f0c43a32dcd6804ef9f5d6	2019-07-01 13:58:15 +03:00
Sami Ahovainio	3863064d90	Fixed bugs in split decision and coefficient coding.	2019-07-01 13:00:43 +03:00
Mikko Pitkänen	a7f09c8114	Merge branch 'threadwrapper'	2019-06-24 16:54:59 +03:00
Sami Ahovainio	db5c0230e5	Fixed coefficient sign hiding	2019-06-20 16:26:01 +03:00
Sami Ahovainio	b51254cafd	Fixed significant coefficient group context calculation	2019-06-20 15:47:13 +03:00
Sami Ahovainio	5e0bea962c	Fixed split context decision	2019-06-20 15:30:49 +03:00
Sami Ahovainio	12322144f0	Removed debug print from context.c	2019-06-20 15:18:22 +03:00
Sami Ahovainio	3a9800d07d	Fixed coefficient coding. Fixed headers to match VTM commit e65075531471a68632bc9252d607655a0feeabc6	2019-06-20 14:43:03 +03:00
Mikko Pitkänen	3dd606ce2e	Add new threadwrapper	2019-06-18 18:45:45 +03:00
Sami Ahovainio	2c78aa0642	Fixes to coeff coding.	2019-06-13 12:01:29 +03:00
Joose Sainio	c94077d15e	remove hardcoded value	2019-06-12 14:37:41 +03:00
Joose Sainio	ac68c8444d	remove negation that wasn't supposed to be there	2019-06-12 14:35:24 +03:00
Joose Sainio	5851dcc3be	missing negation	2019-06-12 14:08:18 +03:00
Joose Sainio	1c36f68d0c	Fix owf>=9 gop=8 and add test to catch such problem in future	2019-06-12 14:04:41 +03:00
Sami Ahovainio	3564b4829e	Fixed split context decision. Modified intra mode initialization to match VTM version aa76fc5c04cf43390f43d63f9977bea8ee31997a.	2019-06-12 12:59:16 +03:00
Sami Ahovainio	a8a53e15b5	Fixed headers to match VTM commit aa76fc5c04cf43390f43d63f9977bea8ee31997a. Added multi_ref_line flag coding.	2019-06-07 13:37:45 +03:00
Ari Lemmetti	933ff6ed55	Merge branch 'set-qp-in-cu-fix'	2019-06-07 09:01:03 +03:00
Sami Ahovainio	8d2581e58c	Fixed issue with kvz_go_rice_par_abs where passing a unsigned argument caused MIN function to return wrong value. Modified coefficient coding to match VTM 5.0. Some issues still remain.	2019-06-05 15:57:18 +03:00
Sami Ahovainio	367f1b2129	Fixed splitting bug caused by wrong values in the headers. Fixed header flags to match VTM commit 5703e81b2de677d976ec15423f5768b17619ba6a	2019-06-05 11:21:02 +03:00
Sami Ahovainio	76d56290ed	Fixed VUI header writing. Fixed debug prints of NAL headers and rbsp_stop_one_bit.	2019-05-31 11:13:11 +03:00
Ari Lemmetti	c6da839002	Set lcu sqrt lambda according to lcu lambda instead of frame lambda when ROI is used	2019-05-29 18:32:10 +03:00
Marko Viitanen	8282a18c36	Fixed headers and NAL writing to match the latest VTM master 988c22cbb9c58584cac3ef0ec7794cafbea6dfd6	2019-05-29 16:18:35 +03:00
Sami Ahovainio	4768ba0628	Minor fixes to header writing. Added contexts for multi_ref_line and BDPCM. Functions added for writing both in bitstream, but they are both disabled for now.	2019-05-29 13:00:19 +03:00
Sami Ahovainio	3339e12169	Fixed some header flags	2019-05-27 09:56:56 +03:00
Ari Lemmetti	9339845e8b	Set QP completely at CU level as the name '--set-qp-in-cu' implies -Move slice delta QP to CU level when using --set-qp-in-cu -Separate functionality from roi	2019-05-24 20:38:39 +03:00
Pauli Oikkonen	081d16fc33	Fix intrinsics that may be missing on some systems Create a header to collect all the workarounds for missing intrinsics in one place	2019-05-23 19:59:40 +03:00
Sami Ahovainio	5b46fbd878	Added multi_ref_idx variable for intra coding (is 0 throughout the code for now). Modified prediction flag writing. Chroma pred flag remains unchanged (ToDo). Added bitstream debug printing on VERBOSE mode.	2019-05-21 12:28:05 +03:00
Sami Ahovainio	ed4e218702	Updated coefficient coding to match VTM 5.0	2019-05-13 15:30:43 +03:00
Sami Ahovainio	504c3dfd1b	Modified the headers to match current VTM headers	2019-05-07 16:30:06 +03:00
Marko Viitanen	30a8a7b97c	WIP fixing the last significant xy coding	2019-05-07 15:01:02 +03:00
Pauli Oikkonen	87a9208db8	Eliminate cvtsi64_si128 intrinsic Apparently it'll cause Win32 builds to break because it emits the movq instruction or something..	2019-04-17 16:30:40 +03:00
Pauli Oikkonen	7175d20bb2	Still include stdint.h for non-vector builds	2019-04-15 19:36:01 +03:00
Pauli Oikkonen	1315c7e2b0	Do not compile any vector code for non-SSE4/AVX2 builds	2019-04-15 19:10:48 +03:00
Pauli Oikkonen	f5f70e7bc5	Merge branch 'sad-optimization'	2019-04-15 19:02:01 +03:00
Jan Beich	85f46e17a9	Detect AltiVec via elf_aux_info() on FreeBSD 12+	2019-04-01 13:08:04 +00:00
Jan Beich	82486255da	Simplify AltiVec detection on Linux	2019-04-01 13:08:04 +00:00
Marko Viitanen	1546acfdb9	New NAL unit IDs and header changes	2019-03-28 10:11:36 +02:00
Marko Viitanen	36eab9c170	New cabac context models with "rate"	2019-03-27 12:38:19 +02:00
Marko Viitanen	3bdc8ac8d3	Fix intra_chroma_pred_mode and cbf contexts	2019-03-26 09:10:09 +02:00
Marko Viitanen	d15f58517f	Changed intra coding to use 6 MPM, implemented merge sort and MPM selection	2019-03-20 15:20:31 +02:00
Marko Viitanen	1081336868	Updated intra pred mode init values	2019-03-20 15:18:32 +02:00
Marko Viitanen	f3acd245ae	New cabac coding function: kvz_cabac_encode_trunc_bin	2019-03-20 15:17:54 +02:00
Marko Viitanen	80d6e4bf05	New split flag calculations	2019-03-20 09:07:58 +02:00
Marko Viitanen	8c84348010	New entropy bit table	2019-03-20 09:07:22 +02:00
Marko Viitanen	2d0348aa6d	New context models	2019-03-20 09:06:57 +02:00
Marko Viitanen	052080747e	New CABAC functions	2019-03-20 09:06:26 +02:00
Marko Viitanen	20667fdba6	Update header bits to VTM 4.0+	2019-03-11 14:02:12 +02:00
Pauli Oikkonen	6d43759604	Create a border-respecting 32-wide AVX hor_sad	2019-03-07 18:01:22 +02:00
Pauli Oikkonen	f218cecb38	Remove offending hor_sad_avx2_w32 function Consider possibly creating a non-offending AVX2 version instead, the way hor_sad_sse41_w32 works. Or maybe there's more essential work to do.	2019-03-05 22:51:41 +02:00
Pauli Oikkonen	df2e6c54fd	4-unroll hor_sad_sse41_arbitrary This may not increase perf though because it's so rarely used function, so keeping icache footprint may be more essential...	2019-03-05 22:45:23 +02:00
Pauli Oikkonen	448eacba7b	Avoid overreading block borders in hor_sad_sse41_arbitrary	2019-03-05 22:34:50 +02:00
Eemeli Kallio	c159e275b7	Merge branch 'max_merge'	2019-03-05 14:39:03 +02:00
Pauli Oikkonen	41f51c08c4	Avoid overrunning buffer in hor_sad_sse41_w32	2019-03-01 15:37:38 +02:00
Pauli Oikkonen	bcd9879359	Include quant coeff range check in non-scaling list execution path too	2019-02-27 17:26:44 +02:00
Pauli Oikkonen	24e6363f64	Remove the kvz_quant_avx2 wrapper function	2019-02-27 16:32:58 +02:00
Pauli Oikkonen	748820f3c5	Eliminate unnecessary loading of coeffs if scaling lists are off	2019-02-27 16:26:35 +02:00
Pauli Oikkonen	5994350f40	Allow quant_flat_avx2 to be used with scaling lists on	2019-02-27 16:25:59 +02:00
Eemeli Kallio	7f4e0acf41	Added check if max-merge is out of bounds	2019-02-19 13:53:42 +02:00
Pauli Oikkonen	9b0e079262	Use SSE instructions for 64-bit SADs instead of MMX VC++ seems to choke on MMX instructions	2019-02-18 20:13:33 +02:00
Pauli Oikkonen	d8b8923028	Add LGPL notices to reg_sad headers	2019-02-18 17:52:47 +02:00
Eemeli Kallio	2a40560888	some variables to const	2019-02-12 11:24:10 +02:00
Eemeli Kallio	8f8e7bb53c	Added possibility to reduce number of maximum number of merge candidates.	2019-02-12 09:21:03 +02:00
Marko Viitanen	1165219842	Update PTL, SPS ext and SPS flags to match VTM 4rc1	2019-02-07 10:00:04 +02:00
Pauli Oikkonen	770db825b9	Create hor_sad_w8 and w4 epol mask the way w16 works	2019-02-06 19:34:26 +02:00
Pauli Oikkonen	aa19bcac8a	Avoid branching in creating shuffle mask in hor_sad_w16	2019-02-06 18:58:46 +02:00
Pauli Oikkonen	2d05ca8520	Remove width from constant-width hor_sad func params They should kinda know it already	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	57db234d95	Move 32-wide SSE4.1 hor_sad to picture-sse41.c It's not used by picture-avx2.c that also includes the header, so it should not be in the header	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	dd7d989a39	Implement 32-wide hor_sad on AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ff70c8a5ec	Utilize horizontal SAD functions for SSE4.1 as well	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f5ff4db01f	4-wide hor_sad border agnostic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	35e7f9a700	Fix hor_sad w8 to work with both borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	836783dd6e	Use hor_sad_w32 for both left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	69687c8d24	Modify hor_sad_sse41_w16 to work over left and right borders	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	51c2abe99a	Modify image_interpolated_sad to use kvz_hor_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	1e0eb1af30	Add generic strategy for hor_sad'ing an non-split width block	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	686fb2c957	Unroll arbitrary-width SSE4.1 hor_sad by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	768203a2de	First version of arbitrary-width SSE4.1 hor_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ccf683b9b6	Start work on left and right border aware hor_sad Comes with 4, 8, 16 and 32 pixel wide implementations now, at some point investigate if this can start to thrash icache	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	760bd0397d	Pad the image buffer by 64 bytes from both ends This will be necessary for an efficient and straightforward implementation of hor_sad for blocks over 16 pixels wide, because they cannot use the shuffle trick because inter-lane shuffling is so hard to do	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	c36482a11a	Fix bug in 24-wide SAD facepalm	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	f781dc31f0	Create strategy for ver_sad Easy to vectorize	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	ca94ae9529	Handle extrapolated blocks with unmodified width using optimized_sad pointer	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91b30c7064	Tidy up kvz_image_calc_sad	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	9db0a1bcda	Create get_optimized_sad func for SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91380729b1	Add generic get_optimized_sad implementation NOTE: To force generic SAD implementation on devices supporting vectorized variants, you now have to override both get_optimized_sad and reg_sad to generic (only overriding get_optimized_sad on AVX2 hardware would just run all SAD blocks through reg_sad_avx2). Let's see if there's a more sensible way to do it, but it's not trivial.	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	45f36645a6	Move choosing of tailored SAD function higher up the calling chain	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	91cb0fbd45	Create strategy for directly obtaining pointer to constant-width SAD function	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	94035be342	Unify unrolling naming conventions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	517a4338f6	Unroll SSE SAD for 8-wide blocks to process 4 lines at once	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	0f665b28f6	Unroll arbitrary width SSE4.1 SAD by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	cbca3347b5	Unroll 64-wide AVX2 SAD by 2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	84cf771dea	Unroll 32 and 16 wide SAD vector implementations by 4	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	5df5c5f8a4	Cast all pointers to const types in vector SAD funcs Also tidy up the pointer arithmetic	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a711ce3df5	Inline fixed width vectorized SAD functions	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	6504145cce	Remove 16-pixel wide AVX2 SAD implementation At least on Skylake, it's noticeably slower than the very simple version using SSE4.1	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4cb371184b	Add SSE4.1 strategy for 24px wide SAD and an AVX2 strategy for 16	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	796568d9cc	Add SSE4.1 strategies for SAD on widths 4 and 12 and AVX2 strategies for 32 and 64	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	4d45d828fa	Use constant-width SSE4.1 SAD funcs for AVX2	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	2eaa7bc9d2	Move SSE4.1 SAD functions to separate header	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	d2db0086e1	Create constant width SAD versions for 8 and 16 pixels	2019-02-04 20:41:40 +02:00
Pauli Oikkonen	a13fc51003	Include a blank AVX2 strategy registration function even in non-AVX2 builds	2019-02-04 19:52:24 +02:00
Pauli Oikkonen	d55414db66	Only build AVX2 coeff encoding when supported ..whoops	2019-02-04 19:34:30 +02:00
Pauli Oikkonen	3fe2f29456	Merge branch 'encode-coeffs-avx2'	2019-02-04 18:52:31 +02:00
Pauli Oikkonen	722b738888	Fix more naming issues	2019-02-04 16:05:43 +02:00
Pauli Oikkonen	e26d98fb75	Rename a couple variables and add crucial comments	2019-02-04 15:57:07 +02:00
Pauli Oikkonen	f186455619	Move encode_last_significant_xy out of strategy modules It's the exact same in both AVX2 and generic, and does not seem to be worth even trying to vectorize	2019-02-04 14:55:41 +02:00
Pauli Oikkonen	3f7340c932	Fine-tune pack_16x16b_to_16x2b Avoid mm_set1 operation when it's possible to create the constant with one bit-shift operation from another instead. Thanks Intel for 3-operand instruction encoding!	2019-02-04 14:44:47 +02:00
Pauli Oikkonen	314f5b0e1f	Rename 16x2b cmpgt function, comment it better, optimize it slightly Eliminate an unnecessary bit masking to make it even more messy	2019-02-04 14:44:32 +02:00
Pauli Oikkonen	d8ff6a6459	Fix _andn_u32 to work on old Visual Studio	2019-02-01 15:34:42 +02:00
Pauli Oikkonen	26e1b2c783	Use (u)int32_t instead of (unsigned) int in reg_sad_sse41	2019-01-10 14:37:04 +02:00
Pauli Oikkonen	3a1f2eb752	Prefer SSE4.1 implementation of SAD over AVX2 It seems that the 128-bit wide version consistently outperforms the 256-bit one	2019-01-10 13:48:55 +02:00
Pauli Oikkonen	9b24d81c6a	Use SSE instead of AVX for small widths Highly dubious if this will help performance at all	2019-01-07 20:12:13 +02:00
Pauli Oikkonen	b2176bf72a	Optimize SSE4.1 version of SAD Make it use the same vblend trick as AVX2. Interestingly, on my test setup this seems to be faster than the same code using 256-bit AVX vectors.	2019-01-07 19:40:57 +02:00
Pauli Oikkonen	887d7700a8	Modify AVX2 SAD to mask data by byte granularity in AVX registers Avoids using any SAD calculations narrower than 256 bits, and simplifies the code. Also improves execution speed	2019-01-07 18:53:15 +02:00
Pauli Oikkonen	7585f79a71	AVX2-ize SAD calculation Performance is no better than SSE though	2019-01-07 16:26:24 +02:00
Pauli Oikkonen	ab3dc58df6	Copy SAD SSE4.1 impl to AVX2	2019-01-03 18:31:57 +02:00
Pauli Oikkonen	45ac6e6d03	Tidy pack_16x16b_to_16x2b comments	2019-01-03 16:37:05 +02:00
Ari Lemmetti	cd818db724	Add missing quantization and residual in cost calculation (inter rd=2).	2018-12-21 15:55:29 +02:00
Pauli Oikkonen	016eb014ad	Move packing 16x16b -> 16x2b into separate function	2018-12-20 10:51:44 +02:00
Ari Lemmetti	b234897e8a	Fix smp and amp blocks in fme and revert previous change. Filter 8x8 (sub)blocks even with 8x4, 4x8, 16x4, 4x16 etc. Calculate SATD on the 8x4, ... part	2018-12-19 21:30:53 +02:00
Pauli Oikkonen	9aaa6f260d	Fixes to enable portability	2018-12-18 20:42:09 +02:00
Pauli Oikkonen	2fdbbe9730	Move CG reordering code from quant-avx2 to shared header	2018-12-18 19:42:18 +02:00
Pauli Oikkonen	d02207306d	Create a header file for shared AVX2 code	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	361bf0c7db	Precompute >=2 coeff encoding loop with 2-bit arithmetic Who needs 16x16b vectors when you can do practically the same with 16x2b pseudovectors in 32-bit general purpose registers!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	940b0e9e6a	Require BMI2 for AVX2 build Any processor implementing AVX2 should also implement BMI2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	f66cb23d5b	Optimize greater1 encoding loop Calculating the c1 variable need not be a serial operation!	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	8c8b791c35	Vectorize kvz_context_get_sig_ctx_inc	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	033261eb74	Eliminate two branches using bit magic	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c4434e8d04	Scan CG's in forward order to simplify finding last significant	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	efd097f5a5	Vectorize the coeff group loop to some extent	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	a01362e638	use the efficient method of reordering raster->scan	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	50a888e789	Use the efficient method to find first and last nz coeffs in block	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	7e9203f566	Scan coeff groups in scan order to help find last significant one	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	9a5a6fdbc7	Simplify two ifs in encode_coeff_nxn-avx2	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	37a2a8bac8	See if loop can be optimized by rearranging	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	584f2f74b6	Vectorize significant coeff group scanning loop	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	1bfed73221	Add AVX2 strategy for encode_coding_tree	2018-12-18 19:41:09 +02:00
Pauli Oikkonen	c3a6f3112a	Add generic strategy group for encode_coding_tree	2018-12-18 19:41:09 +02:00
Marko Viitanen	1ef851ab4b	Disable FME on amp/smp blocks with width or height not divisible by 8	2018-12-18 10:28:21 +02:00
Joose Sainio	b71c5573f0	Merge branch 'rate_control_fix'	2018-12-17 12:39:27 +02:00
Sergei Trofimovich	68a70e45a1	x86 asm: mark stack as non-executable Gentoo's `scanelf` QA tool detects writable/executable stack of assembly-writtent files as: ``` $ scanelf -qRa . 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/.libs/picture-x86-asm-satd.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-sad.o 0644 LE !WX --- --- ./src/strategies/x86_asm/picture-x86-asm-satd.o ``` Normally C compiler emits non-executable stack marking (or GNU assembler via `-Wa,--noexecstack`). The change adds non-executable stack marking for yasm-based assmbly files. https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart has more details. Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>	2018-12-16 11:31:56 +00:00
Reima Hyvönen	1fcc5c6a8d	Merge branch 'bipred_recon'	2018-12-11 09:59:35 +02:00
Reima Hyvönen	e4a10880f3	Added case 12 to bipred_recon no mov	2018-12-11 09:52:17 +02:00
Marko Viitanen	a4f3968e52	Fix Visual Studio errors by initializing some variables used in AVX2 signhiding	2018-12-11 09:33:26 +02:00
Ari Lemmetti	ac943147e3	Calculate satd cost for whole non-square blocks as well.	2018-12-10 17:04:29 +02:00
Pauli Oikkonen	c465578048	Add a descriptive comment to coefficient reordering	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	f78bf2ebcb	Optimize q_coefs usage for indexed fetch	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	d9591f1b49	Eliminate midway buffering of reordered coefs TODO: For some mysterious reason seems slightly slower than the buffered one	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7fe454c51f	Optimize get_cheapest_alternative()	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	6bbd3e5a44	Optimize rearrange_512 function	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	cb8209d1b3	Vectorize transform coefficient reordering loop	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	7cf4c7ae5f	Rename "reduce" functions to hsum That's what the functions fundamendally do anyway	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	316cd8a846	Fix ALIGNED keyword and grow alignment to 64B	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	1befc69a4c	Implement sign bit hiding in AVX2	2018-12-03 15:36:32 +02:00
Pauli Oikkonen	c5cd03497e	Require BMI and ABM instruction sets for AVX2 build AVX2 support on a processor should always imply BMI and ABM support. The lzcnt and tzcnt instructions have more suitable semantics in the corner case that source word is 0, and allow us to even handle that scenario without a branch. Apparently Visual Studio will already include this support when building with AVX2 enabled, so only the automake files need to be tweaked.	2018-12-03 15:36:32 +02:00
Reima Hyvönen	f8696b54a4	Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)	2018-11-20 17:09:19 +02:00
Marko Viitanen	a5a10a33c3	Enable --scaling-list parameter and add to the documentation	2018-11-19 10:47:30 +02:00
Reima Hyvönen	710ba288db	Chroma has some problems	2018-11-15 16:42:48 +02:00
Sami Ahovainio	8f98d4aac7	Added square search	2018-11-14 14:50:31 +02:00
Marko Viitanen	6871490dd5	Simplify get_mvd_coding_cost(), only include golomb coding	2018-11-14 14:33:31 +02:00
Ari Lemmetti	a832206bb6	Replace 32-bit incompatible instrinsics	2018-11-12 18:54:33 +02:00
Ari Lemmetti	5c774c4105	Rewrite most of FME and interpolation filters Changes had to break a lot of stuff and were just squashed into this horrible code dump	2018-11-08 20:21:16 +02:00
Joose Sainio	1c8a1f24e2	Don't assume anything about bits spent	2018-11-07 16:03:38 +02:00
Joose Sainio	3471e2470d	Fix using uninitialized value for the first frame	2018-11-07 08:17:39 +02:00
Joose Sainio	d95ac11a3b	Fix rate_control for other LP-GOPS	2018-11-06 14:20:44 +02:00
Joose Sainio	67a6ba667e	Fix rate control for flat lp-gop	2018-11-06 09:38:17 +02:00
Reima Hyvönen	7406c33a42	Some more cleaning	2018-10-26 12:25:18 +03:00
Reima Hyvönen	4c71546b2e	Cleaned some coding	2018-10-26 12:19:44 +03:00
Reima Hyvönen	4fe3909e48	Switched luma to use 32bits size ints intstead of 16bit size	2018-10-24 18:24:46 +03:00
Marko Viitanen	465bc2cfee	[EMT] make functions static and prefix arrays with kvz_g	2018-10-18 10:54:33 +03:00
Marko Viitanen	b133e7de1e	VTM 2.2 changed -> remove high_precision_motion_vectors flag	2018-10-17 12:41:14 +03:00
Marko Viitanen	169febd1c4	[EMT] Simplify DCT8, DCT5, DST1 and DST7 definitions	2018-10-17 12:17:54 +03:00
Marko Viitanen	e015d7eb2b	Fix compiler warnings	2018-10-17 10:43:11 +03:00
Marko Viitanen	ad310c77d3	Added EMT transforms to the strategies	2018-10-17 08:56:49 +03:00
Eemeli Kallio	284e73839e	Calculating zero cost moved to its own function	2018-10-16 11:02:01 +03:00
Reima Hyvönen	381e786e10	Trying to find the bug in luma	2018-10-11 18:08:41 +03:00
Marko Viitanen	c589e5ed36	Fix closed-gop frame feed, the ordering was incorrect after the first GOP	2018-10-10 11:12:03 +03:00
Reima Hyvönen	2f5f81bac3	removed the non-optimated bipred function	2018-10-09 11:19:23 +03:00
Marko Viitanen	75dce4f3ce	Fix low-delay-gop usage with --no-open-gop	2018-10-04 15:16:02 +03:00
Marko Viitanen	de71b58f76	Change closed GOP structure to include an additional IDR between GOPs	2018-10-04 11:17:03 +03:00
Marko Viitanen	1e1a80e4a6	[TMVP] fix clamping of block offsets and clean up the code a bit	2018-10-03 12:34:48 +03:00
Reima Hyvönen	212a8e68fa	Modified to avoid memory overflow, still some bug inside luma	2018-10-02 20:23:32 +03:00
Marko Viitanen	954f07e3d7	Add --(no-)open-gop option	2018-10-02 10:05:32 +03:00
Marko Viitanen	027359c3c3	Implement TMVP duplicate checking as in VTM 2.1	2018-09-28 11:50:36 +03:00
Marko Viitanen	571a545416	Fix spatial merge candidate selection	2018-09-26 15:10:31 +03:00
Marko Viitanen	63760ca0cf	Use kvz_cabac_bins_verbose flag to control cabac debug printing	2018-09-26 12:01:23 +03:00
Marko Viitanen	7c37f456f9	Fix implicit Qt split for p-frames	2018-09-26 12:00:18 +03:00
Marko Viitanen	b6f2c66c73	Fixed intra Most Probable Mode (mpm) derivation to conform VTM 2.1	2018-09-21 10:33:54 +03:00
Sami Ahovainio	a2b2275d87	Fixed array sizes in search_intra_rough from 35 to 67	2018-09-18 11:49:15 +03:00
Sami Ahovainio	82fb80ab6e	Fixed couple of if-clauses which still used the old intra mode range.	2018-09-17 08:56:43 +03:00
Marko Viitanen	a437d4c508	Fixed intra chroma mode bitstream writing (chroma search not used)	2018-09-13 15:05:00 +03:00
Marko Viitanen	389aeebe07	Added 2x2 transform functions	2018-09-13 14:51:07 +03:00
Marko Viitanen	445c059b4a	Fix transforms for VTM 2.0, generated new transform matrices and added a shift by 2 for forward and inverse	2018-09-13 14:39:49 +03:00
Marko Viitanen	35fa8e9785	Fix kvz_intra_get_dir_luma_predictor -> Intra working	2018-09-13 12:32:17 +03:00
Marko Viitanen	f75b0b11c3	Simplify intra filtered ref pixel selection	2018-09-13 10:09:52 +03:00
Sami Ahovainio	4bb484a86a	Fixed if-clause at search_intra.c to use new wider range of intra modes	2018-09-13 09:58:48 +03:00
Marko Viitanen	82de0fbee7	Switch intra search to use the actual 67 modes	2018-09-13 09:43:45 +03:00
Marko Viitanen	382917bcd3	New table for choosing angular intra filtered references and a small bugfix on the end condition of angular intra	2018-09-13 09:35:55 +03:00

... 7 8 9 10 11 ...

3175 commits