hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 18:34:06 +00:00

Author	SHA1	Message	Date
Arttu Ylä-Outinen	2991962033	Add reference counting to threadequeue_job_t Both the thread queue and the encoder states hold pointers to the thread queue jobs. It is possible that a job is removed from the thread queue and freed while the encoder state is still using it. This commit adds reference counting to threadqueue_job_t in order to fix the problem. Fixes #161.	2017-04-12 16:13:52 +03:00
Arttu Ylä-Outinen	bd8adff43a	Drop unused defines in threads.h	2017-04-12 03:41:07 -07:00
Arttu Ylä-Outinen	7ab0a7aff2	Fix semaphores on Mac POSIX semaphores are deprecated on Mac. This commit replaces POSIX semaphores by Grand Central Dispatch semaphores when building on Mac.	2017-04-12 03:41:02 -07:00
Arttu Ylä-Outinen	26693e1402	Fix reliance on undefined behaviour in encmain Pthread mutexes were used for synchronization in encmain by locking and unlocking them from different threads. However, according to the POSIX standard, unlocking a mutex from a different thread is undefined behaviour. This commit replaces the mutexes by semaphores which can be used from different threads.	2017-04-12 03:23:58 -07:00
Ari Lemmetti	47a9f0de04	Modify and use FILL_ARRAY macro to prevent warning on GCC 7 Following warning was given and is false positive error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]	2017-04-11 14:04:25 +03:00
Eemeli Kallio	f7e01b8ba1	Fixed error on rd=3	2017-04-05 13:27:14 +03:00
Eemeli Kallio	9f605152ae	Changed intra to use best rough cost when using inter and rd=2	2017-04-05 13:01:32 +03:00
Ari Lemmetti	33ce101ab5	Revert "Use sizeof(uint32_t) to avoid warning in GCC7." Did not fix the problem. This reverts commit `e3c3e74926`.	2017-04-03 20:21:33 +03:00
Ari Lemmetti	e3c3e74926	Use sizeof(uint32_t) to avoid warning in GCC7. error: 'memset' used with length equal to number of elements without multiplication by element size [-Werror=memset-elt-size]	2017-04-03 19:16:09 +03:00
Arttu Ylä-Outinen	df359b8f95	Fix indentation in encode_coding_tree.c Fixes indentation of a for loop that was causing a misleading indentation warning on GCC. Fixes #163.	2017-03-08 22:56:28 +09:00
Pierre-Loup Cabarat	2b8ce5e47c	Add intra prediction modes encryption	2017-03-06 17:27:39 +01:00
Arttu Ylä-Outinen	aae141f2d3	Fix order of frames with --debug When the decoding and presentation orders of pictures are different (with GOP), the frames in YUV debug output would be in the decoding order. This commit changes the kvazaar command line program to store the reconstructed pictures in a buffer so that they can be output in the presentation order. Fixes #101.	2017-02-28 14:09:24 +09:00
Arttu Ylä-Outinen	094b39e7fc	Refactor inter MV/merge candidate selection Adds struct merge_candidates_t for holding the spatial and temporal merge candidates. Changes functions with separate parameters for each candidate to use the struct instead.	2017-02-22 15:56:36 +09:00
Arttu Ylä-Outinen	3409748a8f	Refactor inter MVP candidate selection Adds helper function add_mvp_candidate.	2017-02-22 15:56:27 +09:00
Arttu Ylä-Outinen	ef6503c728	Refactor inter merge candidate selection Adds helper function add_merge_candidate and replaces macro CHECK_DUPLICATE with function is_duplicate_candidate.	2017-02-22 02:50:52 +09:00
Arttu Ylä-Outinen	f12e09bc40	Refactor inter TMVP selection Adds helper function add_temporal_candidate to inter.c.	2017-02-22 02:08:10 +09:00
Arttu Ylä-Outinen	4f88066740	Refactor MV and merge candidate selection Replaces macros APPLY_MV_SCALING and CALCULATE_SCALE with helper functions.	2017-02-22 01:14:16 +09:00
Arttu Ylä-Outinen	db08041d9a	Refactor inter TMVP selection Merges three if-clauses to remove two levels of indentation.	2017-02-21 23:56:01 +09:00
Marko Viitanen	85e2a40da3	Clip scaled motion vectors, scale and td/tb values to appropriate limits Fixes #158.	2017-02-20 15:40:20 +02:00
Ari Koivula	7369f25f64	Bump version to 1.1.0	2017-02-16 20:52:05 +02:00
Ari Lemmetti	b021d2244e	Reduce more unnecessary initializations.	2017-02-16 17:25:26 +02:00
Ari Lemmetti	acd12cba1e	Remove unnecessary memory initialization to zero Values in interval [last_scanpos, 0] are overwritten in following for loop, except for the sig_coeff_inc value.	2017-02-16 16:48:48 +02:00
Ari Koivula	7ff33e1bf2	Fix default reference picture count The default was 3, instead of the intended 1 of the medium preset.	2017-02-13 17:34:28 +02:00
Marko Viitanen	4251607c04	Fix a bug in TMVP reference POC list	2017-02-13 15:19:24 +02:00
Marko Viitanen	4270d451e6	Fixed some errors after rebase	2017-02-13 15:19:24 +02:00
Marko Viitanen	95effb00d0	Disable TMVP in frames with zero L0 references	2017-02-13 15:19:24 +02:00
Marko Viitanen	b4de1878be	Fixed TMVP scaling and candidate selection for B-frames	2017-02-13 15:19:23 +02:00
Marko Viitanen	23be633ad7	Added TMVP merge candidate scaling for L0	2017-02-13 15:19:23 +02:00
Marko Viitanen	e6aa1b9b9a	Renamed get_mv_cand_from_spatial() to get_mv_cand_from_candidates()	2017-02-13 15:19:23 +02:00
Marko Viitanen	1124bb5fd0	Cleaned up TMVP, mv candidate selection working, merge candidate selection not	2017-02-13 15:19:23 +02:00
Marko Viitanen	d65d2ec88d	WIP: add list of POCs used in the image when pushing to reference	2017-02-13 15:19:22 +02:00
Marko Viitanen	6a25cd3248	WIP: work on tmvp on inter	2017-02-13 15:19:22 +02:00
Marko Viitanen	e538a94eda	Enable TMVP with B-frames	2017-02-13 15:19:22 +02:00
Arttu Ylä-Outinen	363b8b49a2	Fix integer overflows with large resolutions Limits video size so that the number of luma and chroma pixels can be stored in an int. Fixes some integer overflows that resulted in segmentation faults.	2017-02-12 11:40:13 +09:00
Arttu Ylä-Outinen	a5a925fc28	Replace timed waits by normal waits in threadqueue Replaces calls to pthread_cond_timedwait with pthread_cond_wait in threadqueue.c. Simplifies code, as there should be no need for the timeout.	2017-02-11 15:42:03 +09:00
Arttu Ylä-Outinen	fd057498fc	Simplify kvz_config_alloc	2017-02-11 15:42:03 +09:00
Arttu Ylä-Outinen	7f7844caad	Fix finalizing uninitialized encoder states Finalization functions for frame and tile encoder states accessed the frame and tile fields of the encoder state even though they might be NULL. This is the case when the initialization of an encoder state fails. Fixed by adding NULL checks.	2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen	51786eda67	Drop redundant fields in encoder_control_t Some of the fields in encoder_control_t were simply copies of the corresponding fields in kvz_config. This commit drops the copied fields in favor of using the fields in encoder_control_t.cfg directly.	2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen	6a178dee96	Fix leaking memory when --cqmfile given many times Any previously allocated CQM file name was not freed when allocating memory for the new file name.	2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen	63a567ad8a	Fix leaking memory when --roi given many times Any previously allocated delta QP array was not freed when allocating a new array.	2017-02-09 14:05:21 +09:00
Arttu Ylä-Outinen	bfd89136a4	Fix ROI delta QP array not getting freed	2017-02-09 13:23:55 +09:00
Arttu Ylä-Outinen	e78a8dfcf5	Copy the kvz_config passed to encoder_open The kvz_config struct is created by the user but kvazaar keeps a pointer to it. It is easy to break things by modifying the configuration outside kvazaar. In addition, kvazaar modifies the struct even though it is has a const modifier. This commit changes the field cfg in encoder_control_t to be a copy of the kvz_config struct instead of a pointer, removing modifications to the const struct and allowing users to do whatever they want with it after opening the encoder.	2017-02-09 13:23:54 +09:00
Ari Koivula	b8e3513a23	Fix crash with sub-LCU frame sizes and WPP The end of slice was being calculated incorrectly, which led to no tile being created inside the slice, which led to an assert triggering. This fixes the wrong end of slice calculation, but also disallows wavefront rows from being created, if there would be only one. The wavefront initialization code assumes there are always more than one row, so the inter-frame dependency doesn't get added properly. Fixes #153.	2017-02-08 21:41:30 +02:00
Ari Koivula	d893474bab	Fix encoder getting stuck on OS-X Main thread was stuck looping on pthread_cond_timedwait because the abs time given on OS-X had already passed and the wait returned immediately without releasing the mutex to allow worker threads to proceed. Fix was to use the gettimeofday, which returns real time instead of monotonic, which is what pthread_cond_timedwait wants.	2017-02-02 17:27:46 +02:00
Ari Koivula	4ceda1908b	Fix OS-X compiler warning rdo.c:475:25: warning: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long long') but has parameter of type 'int' which may cause truncation of value [-Wabsolute-value] current.cost = -abs(quant_cost_in_bits) + (bits << PRECISION_INC); ^ rdo.c:475:25: note: use function 'llabs' instead current.cost = -abs(quant_cost_in_bits) + (bits << PRECISION_INC);	2017-02-01 18:09:17 +02:00
Ari Koivula	c7d536bbcd	Fix OS-X compiler warning cfg.c:1024:74: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'unsigned long long' [-Wformat] fprintf(stderr, "Too large ROI size: %llu (maximum %zu).\n", size, SIZE_MAX);	2017-02-01 18:09:04 +02:00
Ari Koivula	4467506ef1	Add missing kvz_ prefix	2017-01-31 18:38:02 +02:00
Ari Koivula	ed3bd898fd	Remove Exp-Golomb lookup table This table takes 256kB and isn't used very much. Au revoir!	2017-01-31 18:31:05 +02:00
Ari Koivula	5513744d24	Merge branch 'slices'	2017-01-31 16:14:30 +02:00
Ari Koivula	52904d3e9f	Add --slices=tiles and --slices=wpp This encapsulates tiles or WPP rows into their own slices, making it possible to send them as soon as they are done, instead of waiting for the other substreams to finish and coding the substream offsets in the slice header.	2017-01-31 15:44:23 +02:00
Ari Koivula	0d4d0e869c	Add support for independent slices Not used yet, but they work.	2017-01-31 15:11:50 +02:00
Ari Koivula	46ae382498	Fix bugs with slice header These fixes allow more than one slice to be used to code a picture. - Use correct number of bits to code the slice segment address. - Don't offset_len_minus1 for slices without substreams.	2017-01-31 14:01:59 +02:00
Ari Koivula	f1fc0de2bf	Write slice headers to the parent stream Appending to the child stream doesn't work is the child is a leaf slice state. Simplifies flow by removing distinction between tile and slice. Now that slice headers are written in the parent stream, there is zero difference between tiles and slices from bitstream point of view.	2017-01-31 13:55:05 +02:00
Ari Koivula	04cd875b2c	Move substream finalization to LCU coding job Having some of the termination bits in the LCU coding and some in the substream finalization was needlessly confusing. Doing substream finalization directly after LCU coding makes it easy to verify that the finalization is done correctly. Removes one job per WPP row from the job queue. Removes kvz_cabac_flush, because I don't like bits being put into the bitstream implicitly. Better to have it all in the open.	2017-01-31 13:01:57 +02:00
Ari Koivula	ead490b7b7	Write a new slice NAL for every slice	2017-01-31 12:36:18 +02:00
Ari Koivula	cd496bf50b	Move first_nal_in_au to encoder_state->frame Needed for writing NALs from encoder_state_write_bitstream_children	2017-01-31 12:28:28 +02:00
Arttu Ylä-Outinen	1e6463c08b	Fix inter bipred search When the number of merge candidates was five, biprediction search would read past the bounds of the priority list arrays. Fixed to limit the search to the first four candidates.	2017-01-31 18:23:12 +09:00
Ari Lemmetti	2c069a3e5f	Prevent unnecessary cu search Prevent further analysis as soon as it is known that splitting can not improve cost	2017-01-30 16:21:41 +02:00
Arttu Ylä-Outinen	9b889c3fab	Fix reading ROI files - Checks the return value of fopen when opening the ROI file. Fixes a segfault when the file cannot be opened. - Check that the width and height are positive. Fixes reading past the end of the delta QP array in kvz_set_lcu_lambda_and_qp. - Check for overflow in width * height. Fixes an overflow resulting in a segfault. - Properly check that fscanf succeeds. Fixes silently accepting ROI files that are too short. - Properly close the FILE pointer.	2017-01-29 18:57:27 +09:00
Arttu Ylä-Outinen	46c9a483c3	Fix inter search for small SMP and AMP blocks The function search_pu_inter_ref incorrectly rounded the coordinates of the block to down to a multiple 8 pixels. Small SMP and AMP blocks may start at coordinates that are not multiples of 8. Fixed by removing the rounding. Fixes a failing assert when --mv-constraint is used with --smp or --amp.	2017-01-29 13:34:50 +09:00
Arttu Ylä-Outinen	fb10b56b82	Fix checking if a low delay GOP structure is used Stops assuming that having cfg->gop_lowdelay set means that GOP structure is used since it is possible that cfg->gop_lowdelay is true but cfg->gop_len is zero. Adds checks for cfg->gop_len where needed. Fixes a possible division by zero in kvz_encoder_feed_frame.	2017-01-28 21:56:00 +09:00
Arttu Ylä-Outinen	4f56b04239	Drop an unnecessary conditional Drop a conditional for depth > MAX_DEPTH in search_cu. The depth cannot be greater than MAX_DEPTH (== 3) since an earlier if-clause checks that it is less than MAX_PU_DEPTH (== 4).	2017-01-28 21:35:27 +09:00
Ari Koivula	937a764987	Fix bug in --mv-constraint Subpixel motion estimation return 0-vector when no subpixel vector is within the constraint. Fix is to not call subpixel motion estimation when the integer vector is not within the constraint.	2017-01-26 09:55:57 +02:00
Ari Koivula	4a0121ac42	Add --roi parameter Adds region of interest coding capability. Works by reading a file of delta QP values which will then be applied to each frame at LCU level.	2017-01-26 09:14:14 +02:00
Ari Koivula	6f61836989	Refactor kvz_rdoq_sign_hiding Rename and reorder everything to make more sense. - Moved input tables into their own struct and renamed them to what they actually represent. - Renamed pretty much every variable to comform to our style and to make sense. - Removed the lastCG stuff, as the function already gets passed the last coeff anyway. (it was named width, what the hell?)	2017-01-19 23:58:17 +02:00
Ari Koivula	a85390d0ac	Clean up code using the fixed point frac bit tables This is to prepare for changing the code using the floating point table to use the fixed point table instead. This also allows reducing the size of the fractional part, which was useful for finding every place where the the fixed point presentation is relied upon.	2017-01-19 20:20:51 +02:00
Ari Koivula	24a69c7467	Refactor luma deblocking Changes luma deblocking to use gather and scatter instead of reading to and writing from here and there in memory. Should make them faster and easier to vectorize, or at least cleaner. Splits strong and weak luma deblocking to two functions, as they have almost nothing in common.	2017-01-17 22:13:39 +02:00
Ari Koivula	4cb2fca924	Refactor deblock decision	2017-01-17 19:34:17 +02:00
Arttu Ylä-Outinen	05794c3548	Add missing static to function lambda_to_qp	2017-01-11 15:53:55 +09:00
Arttu Ylä-Outinen	ee518e8ac4	Take header bits into account in rate control	2017-01-11 15:53:55 +09:00
Arttu Ylä-Outinen	c219d3cd94	Fix deblock when CU QP delta is enabled Fixes deblock functions so that they use the correct QP for the filtered edge. Adds field qp to cu_info_t.	2017-01-11 15:53:22 +09:00
Arttu Ylä-Outinen	82a98180e4	Clip LCU lambda to reduce quality fluctuation Limits lambdas for each LCU based on the computed lambda from the previous frame and the frame-level lambda.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	93172fd251	Use separate alpha, beta and lambda for each LCU Changes rate control to use the alpha and beta values stored in lcu_stats_t instead of the frame-level values when selecting lambda and QP for an LCU.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	3af4e9cc8a	Allocate bits separately for each LCU Bits are allocated based on the costs of the LCUs in the previous completely coded frame. Breaks deblock when rate control is used.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	ff5e5ec6d4	Record info about coded LCUs Adds field lcu_stats to encoder_state_config_frame_t. The following data is recorded for each LCU: - number of bits - squared cost - used lambda value - alpha parameter used for rate control - beta parameter used for rate control	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	2a4243acbe	Refactor rate control Moves all code related to setting QP and lambda values to rate_control module.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	71633889ce	Enable CU QP delta when using rate control When rate control is enabled, enable cu_qp_delta_enabled_flag in PPS with diff_cu_qp_delta_depth set to 0. Also adds code for writing the QP deltas and a new cabac context.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	640ff94ecd	Use separate lambda and QP for each LCU Adds fields lambda, lambda_sqrt and qp to encoder_state_t. Drops field cur_lambda_cost_sqrt from encoder_state_config_frame_t and renames cur_lambda_cost to lambda.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	435c387357	Refactor rate control - Defines MIN_LAMBDA and MAX_LAMBDA constants. - Moves resetting state->frame->cur_gop_bits_coded to rate_control.c. - Changes gop_allocate_bits to return the number of bits allocated like pic_allocate_bits does.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	6c4f2d196a	Move fields from encoder_state_t to frame Moves fields prepared and frame_done from encoder_state_t to encoder_state_config_frame_t.	2017-01-09 01:24:23 +09:00
Arttu Ylä-Outinen	97863cdaa2	Fail encoder init when CQM file cannot be opened	2017-01-08 19:17:43 +09:00
Arttu Ylä-Outinen	db5e750c7f	Fix --threads=auto When --threads=auto was given on the command line, cfg->threads was actually set to zero, disabling threads altogether. Fixed to set cfg->threads to -1, so that the number of threads is chosen automatically.	2017-01-08 17:58:22 +09:00
Ari Koivula	a9e45efcfc	Add a fast lane for byte-aligned bitstream writes The CABAC engine only writes to the bitstream when it has a full byte. These writes are also always byte-aligned, so there is no need to even check for stream alignment. Speedup was around 3% with ultrafast and low QP.	2016-12-23 17:01:44 +02:00
Jaakko Laitinen	deb63f735f	Fix gop disabling	2016-12-20 14:25:13 +02:00
Ari Lemmetti	70a52f0e48	10-bit: add missing bit depth adjustment to ssd	2016-11-17 19:28:04 +02:00
Ari Koivula	fa078102f1	Fix 32bit compilation Got a warning about implicit cast from uint64_t to void*.	2016-11-17 17:53:57 +02:00
Ari Koivula	5ceec06bd3	Merge pull request #148 from Venti-/crypto Crypto	2016-11-16 21:33:55 +02:00
Ari Lemmetti	c31207ea7d	Optimize intra reference building -Add function with reduced logic for the most common case	2016-11-16 18:28:42 +02:00
Ari Koivula	24f2a23ef8	Remove unnecessary crypto state The frame does not need it's own crypto state, since it always has at least one sub tile.	2016-11-16 13:58:41 +02:00
Ari Koivula	8951e34fd2	Change crypto.h stubs to print instead of assert	2016-11-16 13:58:41 +02:00
Wassim Hamidouche	ea82c38906	correct memory allocation	2016-11-16 12:35:28 +02:00
Wassim Hamidouche	da3e2d1d07	resolve parallel encryption	2016-11-16 12:35:28 +02:00
Ari Koivula	b8a618e666	Fix problems with >8 bit input Enforce bit depth promised by --input-bitdepth to avoid crashes when larger values are provided. Do endianess byte swap for all bytes when the buffer gets extended to multiple of 8 pixels, and not just the number of input pixels. Don't swap bytes on a little-endian system.	2016-11-13 19:58:54 +02:00
Ari Koivula	2c005cda25	Fix bug with sub-pixel motion estimation in tiles The width of the tile was being used to index the frame pixel buffer instead of the width of the buffer.	2016-11-07 15:53:52 +02:00
Ari Koivula	78a28e0338	Reformat --help message - Reduce indentation to 6 spaces - Word wrap everything to under 80 characters - Remove defaults from options covered by presets - Add a dash in front of argument descriptions - Add --(no-) to names of parameters that accept it and remove mention of enabling or disabling - Add executable and scripts as a dependancy to make docs	2016-11-04 15:40:28 +02:00
Ari Koivula	d18de19d8a	Fix DTS and PTS not being passed on through lib API Fixes "cur_dts is invalid" warning from FFmpeg.	2016-10-28 19:05:47 +03:00
Ari Koivula	0c41c2ebd6	Make CLI set PTS for each input picture This value is not represented in the HEVC bitstream, which is why it was not set previously. FFmpeg sets and needs it however, so make the CLI set it as well to make sure we handle it correctly.	2016-10-28 19:03:03 +03:00
Ari Koivula	5bf745460d	Re-categorize options in the help message - Move VUI stuff to the bottom - Merge Parallel processing, WPP, Tiles and slices - Add more categories for the other options	2016-10-27 03:26:15 +03:00
Ari Koivula	cb6672b452	Disable WPP when Tiles are enabled Closes #142.	2016-10-27 02:07:10 +03:00
darealshinji	488d042e5f	Bump KVZ_VERSION	2016-10-25 12:32:13 +02:00
Ari Lemmetti	29153ed503	Remove unused variable	2016-10-21 17:28:42 +03:00
Ari Lemmetti	778e46dfd8	Add AVX2 version of SSD	2016-10-21 15:07:53 +03:00
Ari Lemmetti	6f5d7c9e06	Move SSD to strategies	2016-10-21 15:07:23 +03:00
Ari Lemmetti	89b941eab4	Fix typo	2016-10-21 15:07:02 +03:00
Alexis Ballier	1dcc993743	Include i386 & i486 for compiling intel asm. x86_64-pc-linux-gnu-gcc -m32 that I use for building 32bits libraries on amd64 defines only __i386__.	2016-10-14 18:07:37 +02:00
Arttu Ylä-Outinen	5fb7afe8c4	Add --implicit-rdpcm command line parameter. Makes it possible to use lossless coding without implicit residual DPCM.	2016-10-03 20:01:55 +09:00
Arttu Ylä-Outinen	5affc0f527	Use implicit RDPCM in lossless mode. Sets implicit RDPCM flag in SPS when lossy coding is disabled and applies DPCM to intra residual when prediction mode is horizontal or vertical.	2016-10-03 19:31:38 +09:00
Ari Koivula	016dbe0894	Further refine presets The rd-complexity of slow presets is better with a less agressive GOP. Adding the GOP as part of the preset improved BDRate enough, that it didn't make sense anymore to have a veryslow target the best BDRate. Instead, push that responsibility to placebo by making it a little bit faster.	2016-09-29 17:35:12 +03:00
Ari Koivula	31c5ff0f16	Add cross-platform core number detection Well, turns out pthread_num_processors_np isn't standard so we need to do this crap. Threw in hyper threading detection as a bonus.	2016-09-29 00:03:21 +03:00
Ari Koivula	8c7351eac8	Fix lp-gop with depth 1 GOPs with depth 1 had the same structure as those with depth 2: g4d3t1 = 3 2 3 1 g4d2t1 = 2 2 2 1 g4d1t1 = 2 2 2 1 It now results in the correct: g4d1t1 = 1 1 1 1	2016-09-29 00:03:21 +03:00
Ari Koivula	a395aeaac9	Set default settings to those of --preset=medium	2016-09-29 00:03:21 +03:00
Ari Koivula	4388fe0d30	Set presets to ratedistortion-complexity optimized versions	2016-09-29 00:03:20 +03:00
Ari Koivula	facb1e16df	Use -p64 -q22 and --gop=lp-g4d3t1 by default Coding inter without GOP of any kind really isn't a very sensible default. Defaulting to B-GOP of some kind would be more better, but lp-gop is more robust for now.	2016-09-29 00:03:20 +03:00
Ari Koivula	d7391a9593	Improve default for number of parallel frames	2016-09-29 00:03:20 +03:00
Ari Koivula	19d423ab29	Use all available cores by default	2016-09-29 00:03:20 +03:00
Ari Koivula	3f138f087a	Allow non-gop-length --period for lp-gop	2016-09-29 00:03:19 +03:00
Ari Koivula	16790c9f15	Remove number of references from --gop=lp syntax The number of references should be part of the presets, so gop should be defined separately.	2016-09-29 00:03:19 +03:00
Ari Koivula	cbfa824d1a	Merge branch 'simd'	2016-09-27 20:49:45 +03:00
Ari Koivula	14a7bcba25	Use a faster function for clipped inter SAD Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes for which we don't have AVX versions yet. Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for: --preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp * Suite speed_tests: -PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) -PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)	2016-09-27 20:48:30 +03:00
Arttu Ylä-Outinen	4313e56c2d	Add --no-rdoq-skip command line switch	2016-09-11 17:40:16 +09:00
Ari Koivula	a7a33b08ec	Remove --slice-addresses from usage message And give a warning if it's used. Slices will have to be implemented at some point, but they aren't yet so let's not advertize them.	2016-09-10 21:06:00 +03:00
Eemeli Kallio	f41e428e5f	Removed kvz_skip_unnecessary_rdoq and reworked --rdoq-skip to skip 4x4 blocks when it is on.	2016-09-09 10:26:07 +03:00
Eemeli Kallio	ed9c0b0416	RDOQ reworked in rdo.c. rdoq_signhide now skips coeffs that are after best_last_idx.	2016-09-09 10:16:51 +03:00
Ari Koivula	02cd17b427	Add faster AVX inter SAD for 32x32 and 64x64 Add implementations for these functions that process the image line by line instead of using the 16x16 function to process block by block. The 32x32 is around 30% faster, and 64x64 is around 15% faster, on Haswell. PASS inter_sad: 28.744M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec) PASS inter_sad: 7.882M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec) to PASS inter_sad: 37.828M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec) PASS inter_sad: 9.081M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)	2016-09-01 21:36:39 +03:00
Ari Koivula	d0512d25c6	Use fixed point in get_mvd_coding_cost	2016-08-30 21:37:12 +03:00
Ari Koivula	ec7507a935	Further optimize get_ep_ex_golomb_bitcost Unrolled 16-bit log2 calculation.	2016-08-30 21:37:01 +03:00
Ari Koivula	a4ba794587	Optimize get_ep_ex_golomb_bitcost Arrange the decision tree such that there is only 3 branches on the most common paths and the more likely branch is always fall-through. A profile guided optimization pass would probably do something similar.	2016-08-30 05:24:16 +03:00
Ari Koivula	82cfab58f8	Improve fast mvd coding cost estimation A lot of time is being taken up by this function on ultrafast, and it doesn't do a very good job. This change aims to both simplify the logic and make the estimate better. The logic is simplified by using a look up for the step mvd bit cost step function instead of mimicking the binarization process. The estimation is made better by checking fractional cabac bit costs. The new function returns the same results as kvz_get_mvd_coding_cost_cabac, but is also faster than the old function.	2016-08-30 04:55:09 +03:00
Ari Koivula	d31be8eb27	Make mvd_coding_cost functions take const cabac	2016-08-30 04:46:46 +03:00
Ari Koivula	64d631c174	Fix 8bit to 10bit input conversion regression	2016-08-25 22:09:40 +03:00
Ari Koivula	27789125d8	Fix input bit depth conversion The input was being shifted to the wrong direction.	2016-08-25 22:05:25 +03:00
Ari Koivula	4ec039004b	Add monochrome encoding Write bitstream without chroma when encoding with --input-format=P400. This reduces bitstream size by 0-1 %, compared to coding monochrome in 420 format, and speeds up encoding slightly due to not processing chroma.	2016-08-25 20:15:26 +03:00
Ari Koivula	c5b70cf812	Add chroma format support to yuv_t	2016-08-24 19:20:53 +03:00
Ari Koivula	032ed30ff4	Add chroma format support to kvz_picture Add picture_alloc_csp to libkvz api to allocated pictures with chroma format different from 420.	2016-08-24 19:20:53 +03:00
Ari Koivula	48ccc26839	Add --input-format and --input-bitdepth Adds reading of 10 bit input for 10-bit encoding.	2016-08-24 19:20:53 +03:00
Ari Koivula	cc08073615	Refactor some indexing weirdness in init_lcu_t I thought there might be a bug in this so I cleaned it up.	2016-08-24 19:12:48 +03:00
Ari Koivula	b6d674d66e	Refactor integer vector inter prediction This code was pretty bad, so I cleaned it up a bit.	2016-08-24 19:09:26 +03:00
Ari Lemmetti	28c4174d0e	Fix incorrect shuffle parameters _MM_SHUFFLE uses reverse order	2016-08-23 19:40:46 +03:00
Ari Lemmetti	ce77bfa15b	Replace KVZ_PERMUTE with _MM_SHUFFLE The same exact macro already exists	2016-08-22 19:08:46 +03:00
Jovasa	68eef660bd	Fixed search around mv_in in fullsearch not being saved.	2016-08-19 15:19:29 +03:00
Eemeli Kallio	99d8b9abeb	Changed skip_rdoq name to kvz_skip_unnecessary_rdoq. Changed the order it uses when it goes through CGs and tuned its sum calculation.	2016-08-18 14:02:56 +03:00
Eemeli Kallio	1fb4755f31	Added rdoq-skip to quant-generic.c	2016-08-18 12:17:54 +03:00
Eemeli Kallio	d20ac03ca2	Added --rdoq-skip option	2016-08-18 12:17:53 +03:00
Marko Viitanen	83cf801664	Fixed MV constraint condition in bipred	2016-08-18 08:53:17 +03:00
Marko Viitanen	5ae1c595f2	Fixed slice_temporal_mvp_enabled_flag and disabled TMVP with tiles - slice_temporal_mvp_enabled_flag should be signalled also with non-IDR I-slices	2016-08-10 14:51:41 +03:00
Marko Viitanen	5326519182	TMVP cleanup and const qualifier fixes	2016-08-10 14:10:43 +03:00
Marko Viitanen	f40907260d	Added config parameter for TMVP and cmdline option --no-tmvp - Enabled by default - Cannot be used with GOP at the moment	2016-08-10 14:09:29 +03:00
Marko Viitanen	fd52dac1f7	Fixed TMVP scaling	2016-08-10 14:09:28 +03:00
Marko Viitanen	c664bc8cf7	Added flag collocated_ref_idx to the slice header	2016-08-10 14:09:28 +03:00
Marko Viitanen	c5f2611a38	Fixes for TMVP to work with the new CU array	2016-08-10 14:09:28 +03:00
Marko Viitanen	d85af5755b	TMVP working when only 1 ref frame	2016-08-10 14:09:28 +03:00
Marko Viitanen	39f0165efe	Fix a bug in TMVP, the reference cu_array was being overwritten	2016-08-10 14:09:27 +03:00
Marko Viitanen	adab8c327e	Clean TMVP code	2016-08-10 14:09:20 +03:00
Marko Viitanen	5fa8226ac9	Temporal merge candidate selection	2016-08-10 14:09:20 +03:00
Marko Viitanen	f83042f4a1	Temporal MV candidate selection	2016-08-10 14:09:19 +03:00
Marko Viitanen	f8671581e3	Implemented function kvz_inter_get_temporal_merge_candidates()	2016-08-10 14:09:19 +03:00
Marko Viitanen	2956bdb379	Added flag slice_temporal_mvp_enabled_flag	2016-08-10 14:09:19 +03:00
Arttu Ylä-Outinen	2a946bd88e	Rename encoder_state_t.global to frame "Frame" is more accurate than "global" since when OWF is used, encoder states for each frame have their own struct.	2016-08-10 13:22:36 +09:00
Arttu Ylä-Outinen	5fbb0a8c27	Fix includes	2016-08-10 13:05:40 +09:00
Arttu Ylä-Outinen	aabf6ca3ee	Extract encoding code from encoderstate.c Moves functions kvz_encode_coding_tree and kvz_encode_coeff_nxn from encoderstate.c to encode_coding_tree.c.	2016-08-09 22:16:50 +09:00
Arttu Ylä-Outinen	803f29be8f	Remove reconstructed picture allocation in lossless. Changes encoder_set_source_picture to set the reconstructed picture to a copy of the source picture instead of allocating a new picture when lossless coding is used.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	aaec473a19	Refactor encoder state initialization. - Moves allocation of the reconstructed picture after the source picture is set. - Extracts main state initialization to a separate function from encoder_state_new_frame. - Changes kvz_encoder_feed_frame to return the frame. - Renames some functions to better match their purpose.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	cd7024b3a5	Skip computing SSD when using lossless coding. The SSD is always zero since it is lossless.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	fbbe5d1844	Use kvz_pixels_calc_ssd for SSD in search.c. Replaces loops for computing SSDs by calling kvz_pixels_calc_ssd in search.c.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	22cc97ffb1	Fix missing field initializers.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	06b82bf888	Disable filters, trskip and signhide in lossless. When lossless coding is used, deblock and SAO are skipped, transform skip flag is not written and sign hiding is not used.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	97451ec401	Align assignments in encoder.c.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	1dc94663c3	Bypass transform and quantization with --lossless. When --lossless is given, set cu_transquant_bypass_flag for every CU and bypass transform and quantization by directly copying reference pixels to reconstruction and the residual to coefficients.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	2113b0182d	Enable PPS-level tq bypass flag with --lossless. Sets transquant_bypass_enable_flag to true in PPS when --lossless is given.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	a5897bbece	Make cabac context initialization tables static.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	23e7d9bb37	Add --lossless command line parameter.	2016-08-03 14:25:08 +09:00
Arttu Ylä-Outinen	5372ea432f	Update README and manpage.	2016-08-03 14:25:08 +09:00
Ari Lemmetti	6bcba004ff	Comment out to fix unused code error on clang.	2016-07-14 14:12:16 +03:00
Ari Lemmetti	c0979ebdcb	Implement AVX2 luma sampling	2016-07-14 12:53:02 +03:00
Ari Lemmetti	6244560426	Add avx2 strategy for kvz_filter_frac_blocks_luma.	2016-07-14 12:53:02 +03:00
Ari Lemmetti	9c4e9e049b	Load only what is needed. Eliminate latency from hadds.	2016-07-14 12:53:01 +03:00
Ari Lemmetti	7f71cb423a	Check 4 fractional pixel positions simultaneously	2016-07-14 12:52:24 +03:00
Ari Lemmetti	ad445ab8a1	Transition to kvz_filter_frac_blocks_luma	2016-07-14 12:51:02 +03:00
Ari Lemmetti	fccfbd2f28	Add strategy for kvz_filter_frac_blocks_luma	2016-07-14 12:51:02 +03:00
Ari Lemmetti	e9c3074d32	Add buffers and definitions for upcoming filtering Samples are to be filtered in separate blocks instead of making one big picture with interpolated pixels	2016-07-14 12:51:02 +03:00
Ari Lemmetti	7afe7e963b	Use fme_level to control the search accuracy.	2016-07-14 12:51:01 +03:00
Ari Lemmetti	5fa323bf25	Skip searching best hpel twice. Make hpel and qpel loops similar.	2016-07-14 12:51:01 +03:00
Ari Lemmetti	bc98a9affa	Change the search order to suit lighter fme search	2016-07-14 12:51:01 +03:00
Ari Lemmetti	2b0c8db349	Add quad satd for avx2	2016-07-14 12:50:24 +03:00
Ari Lemmetti	0ff69fd6f8	Add any size multi satd	2016-07-14 12:48:37 +03:00
Ari Lemmetti	d17b9e7d6e	Allow subme parameters 0-4 Update usage, presets,defaults,lib version	2016-07-12 19:49:38 +03:00
Arttu Ylä-Outinen	62ad57d0bf	Fix kvz_image_list_add for zero-sized lists. When a list does not have space for the new element, its size is doubled. If the size of the list is zero, it would not be resized. Fixed to always resize the list so that the new element can be added.	2016-06-22 13:35:16 +09:00
Arttu Ylä-Outinen	433e528af7	Drop unused variable in search_pu_inter. Removes unused variable max_px_below_lcu.	2016-06-22 13:35:16 +09:00
Arttu Ylä-Outinen	7836ff6ec9	Drop unused functions. Removes functions kvz_coefficients_calc_abs, kvz_intra_rdo_cost_compare and kvz_rdo_cost_intra which are no longer used.	2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen	e4b5840f56	Add parentheses around macro arguments in cabac.h.	2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen	a387b74e51	Fix resolution auto-detection. Only try to guess the resolution from filename when neither width nor height is given.	2016-06-22 13:35:15 +09:00
Arttu Ylä-Outinen	097bf8f3c0	Add a typedef for mvd coding cost functions.	2016-06-20 13:56:10 +09:00
Arttu Ylä-Outinen	d3c0e49286	Update comments.	2016-06-16 20:25:08 +09:00
Arttu Ylä-Outinen	ae832cda8c	Pack cbf flags in cu_info_t to two bytes. Reduces size of cu_info_t.	2016-06-16 20:24:19 +09:00
Arttu Ylä-Outinen	cad2d496b8	Enable 4x8 and 4x16 partition modes Enables search for 2NxN and Nx2N partition modes for 8x8 CUs and 2NxnU, 2NxnD, nLx2N and nRx2N partition modes for 16x16 CUs. Changes the loop for copying reconstructed luma pixels in kvz_inter_recon_lcu to use 4 byte chunks instead of 8 byte chunks since it is now possible to have 4 pixel wide blocks.	2016-06-16 20:23:16 +09:00
Arttu Ylä-Outinen	90df7350f0	Make deblocking work with 4 pixel wide blocks.	2016-06-16 20:21:50 +09:00
Arttu Ylä-Outinen	bf26661782	Add support for 4x4 blocks to SATD_ANY_SIZE. Makes functions satd_any_size_generic and satd_any_size_8bit_avx2 work on blocks whose width and/or height are not multiples of 8.	2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen	2ae260e422	Change width of cells in lcu_t to 4 pixels. Intra mode info for NxN partition units is now stored in the corresponding 4x4 cell in lcu_t.cu array.	2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen	360f5bb8da	Always use pixel coordinates for indexing lcu_t. Removes macro LCU_GET_CU and uses LCU_GET_CU_AT_PX in its place.	2016-06-16 18:53:17 +09:00
Arttu Ylä-Outinen	46e8122d27	Add functions for indexing cu_array_t structures. Replaces macro CU_ARRAY_AT with functions kvz_cu_array_at and kvz_cu_array_at_const.	2016-06-16 18:52:19 +09:00

... 2 3 4 5 6 ...

2211 commits