hashirama/uvg266

mirror of https://github.com/ultravideo/uvg266.git synced 2024-11-24 02:24:07 +00:00

Author	SHA1	Message	Date
Ari Lemmetti	02c9e3746c	Add AppVeyor badge	2016-11-16 17:12:36 +02:00
Ari Koivula	b8a618e666	Fix problems with >8 bit input Enforce bit depth promised by --input-bitdepth to avoid crashes when larger values are provided. Do endianess byte swap for all bytes when the buffer gets extended to multiple of 8 pixels, and not just the number of input pixels. Don't swap bytes on a little-endian system.	2016-11-13 19:58:54 +02:00
Ari Koivula	2c005cda25	Fix bug with sub-pixel motion estimation in tiles The width of the tile was being used to index the frame pixel buffer instead of the width of the buffer.	2016-11-07 15:53:52 +02:00
Ari Koivula	bb33cd3901	Update lp-gop syntax in README	2016-11-04 17:22:24 +02:00
Ari Koivula	78a28e0338	Reformat --help message - Reduce indentation to 6 spaces - Word wrap everything to under 80 characters - Remove defaults from options covered by presets - Add a dash in front of argument descriptions - Add --(no-) to names of parameters that accept it and remove mention of enabling or disabling - Add executable and scripts as a dependancy to make docs	2016-11-04 15:40:28 +02:00
Ari Koivula	98a0d54b70	Merge branch 'dts-fix'	2016-10-28 19:06:22 +03:00
Ari Koivula	d18de19d8a	Fix DTS and PTS not being passed on through lib API Fixes "cur_dts is invalid" warning from FFmpeg.	2016-10-28 19:05:47 +03:00
Ari Koivula	0c41c2ebd6	Make CLI set PTS for each input picture This value is not represented in the HEVC bitstream, which is why it was not set previously. FFmpeg sets and needs it however, so make the CLI set it as well to make sure we handle it correctly.	2016-10-28 19:03:03 +03:00
Ari Koivula	c9cfe8d76b	Merge branch 'help'	2016-10-27 03:32:22 +03:00
Ari Koivula	c7da5e981b	Update README and manpage	2016-10-27 03:29:53 +03:00
Ari Koivula	5bf745460d	Re-categorize options in the help message - Move VUI stuff to the bottom - Merge Parallel processing, WPP, Tiles and slices - Add more categories for the other options	2016-10-27 03:26:15 +03:00
Ari Koivula	cb6672b452	Disable WPP when Tiles are enabled Closes #142.	2016-10-27 02:07:10 +03:00
Ari Koivula	4990b0d528	Merge pull request #145 from darealshinji/patch-1 Bump KVZ_VERSION	2016-10-25 19:42:19 +03:00
Ari Koivula	6a162f3bc5	Merge pull request #144 from wiiaboo/appveyor Add appveyor scripts to test with MSYS2	2016-10-25 19:41:01 +03:00
darealshinji	488d042e5f	Bump KVZ_VERSION	2016-10-25 12:32:13 +02:00
Ricardo Constantino	e269b86539	Add appveyor scripts to test with MSYS2	2016-10-21 15:39:29 +01:00
Ari Lemmetti	29153ed503	Remove unused variable	2016-10-21 17:28:42 +03:00
Ari Lemmetti	a1390ca3c0	Merge branch 'ssd-avx2'	2016-10-21 15:08:44 +03:00
Ari Lemmetti	778e46dfd8	Add AVX2 version of SSD	2016-10-21 15:07:53 +03:00
Ari Lemmetti	6f5d7c9e06	Move SSD to strategies	2016-10-21 15:07:23 +03:00
Ari Lemmetti	89b941eab4	Fix typo	2016-10-21 15:07:02 +03:00
Ari Koivula	bfdd492c9f	Merge pull request #141 from aballier/multilib Include i386 & i486 for compiling intel asm.	2016-10-19 21:19:25 +03:00
Alexis Ballier	1dcc993743	Include i386 & i486 for compiling intel asm. x86_64-pc-linux-gnu-gcc -m32 that I use for building 32bits libraries on amd64 defines only __i386__.	2016-10-14 18:07:37 +02:00
Arttu Ylä-Outinen	8ae791a3e1	Fix building with crypto++ Depending on the distro, the pkg-config package name of crypto++ could be either cryptopp or libcrypto++. This commit changes configure to check for both instead of cryptopp only.	2016-10-10 15:13:20 +09:00
Arttu Ylä-Outinen	e7cdd47745	Merge branch 'implicit-rdpcm'	2016-10-03 20:04:00 +09:00
Arttu Ylä-Outinen	5fb7afe8c4	Add --implicit-rdpcm command line parameter. Makes it possible to use lossless coding without implicit residual DPCM.	2016-10-03 20:01:55 +09:00
Arttu Ylä-Outinen	5affc0f527	Use implicit RDPCM in lossless mode. Sets implicit RDPCM flag in SPS when lossy coding is disabled and applies DPCM to intra residual when prediction mode is horizontal or vertical.	2016-10-03 19:31:38 +09:00
Arttu Ylä-Outinen	c418db660b	Update preset table in README.md	2016-10-02 20:11:38 +09:00
Ari Koivula	23dc9a0ada	Allow osx to fail on Travis	2016-09-29 17:39:28 +03:00
Ari Koivula	5f5fffb8b5	Merge branch 'new_presets' Significant boost to either BDRate, speed or both for every preset.	2016-09-29 17:36:45 +03:00
Ari Koivula	016dbe0894	Further refine presets The rd-complexity of slow presets is better with a less agressive GOP. Adding the GOP as part of the preset improved BDRate enough, that it didn't make sense anymore to have a veryslow target the best BDRate. Instead, push that responsibility to placebo by making it a little bit faster.	2016-09-29 17:35:12 +03:00
Ari Koivula	278cd4da9b	Disable WPP in Travis tile tests Now that WPP is on by default, Valgrind is finding memory leaks on these tests. It's not a priority so I'll just disable it for now. ==8120== Memcheck, a memory error detector ==8120== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==8120== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==8120== Command: /home/travis/build/Venti-/kvazaar/src/.libs/lt-kvazaar -i mandelbrot_264x130.yuv --input-res=264x130 -o test.265 -p4 -r2 --owf=1 --threads=2 --tiles-height-split=u2 --rd=0 --no-rdoq --no-deblock --no-sao --no-signhide --subme=0 --pu-depth-inter=1-3 --pu-depth-intra=2-3 ==8120== Disabling TMVP because tiles are used. Compiled: INTEL, flags: MMX SSE SSE2 Detected: INTEL, flags: MMX SSE SSE2 SSE3 SSSE3 SSE41 SSE42 Available: sse2(2) sse41(1) In use: sse2(1) sse41(1) Input: mandelbrot_264x130.yuv, output: test.265 Video size: 264x136 (input=264x130) ==8120== Conditional jump or move depends on uninitialised value(s) ==8120== at 0x4E5FEE5: kvz_threadqueue_job_dep_add (threadqueue.c:616) ==8120== by 0x4E3DEAB: encoder_state_worker_encode_children (encoderstate.c:432) ==8120== by 0x4E3E219: encoder_state_encode (encoderstate.c:649) ==8120== by 0x4E3DE35: encoder_state_worker_encode_children (encoderstate.c:417) ==8120== by 0x4E3E219: encoder_state_encode (encoderstate.c:649) ==8120== by 0x4E3DE35: encoder_state_worker_encode_children (encoderstate.c:417) ==8120== by 0x4E3E219: encoder_state_encode (encoderstate.c:649) ==8120== by 0x4E3ECBD: kvz_encode_one_frame (encoderstate.c:941) ==8120== by 0x4E4DA22: kvazaar_encode (kvazaar.c:229) ==8120== by 0x4E4E228: kvazaar_field_encoding_adapter (kvazaar.c:280) ==8120== by 0x40137F: main (encmain.c:436) ==8120== lt-kvazaar: threadqueue.c:618: kvz_threadqueue_job_dep_add: Assertion `job && depends_on' failed. ==8120== ==8120== HEAP SUMMARY: ==8120== in use at exit: 1,320,764 bytes in 568 blocks ==8120== total heap usage: 584 allocs, 16 frees, 1,330,691 bytes allocated ==8120== ==8120== 112 bytes in 1 blocks are definitely lost in loss record 27 of 88 ==8120== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8120== by 0x4E46BA5: kvz_image_alloc (image.c:49) ==8120== by 0x401E12: input_read_thread (encmain.c:183) ==8120== by 0x55EDE99: start_thread (pthread_create.c:308) ==8120== ==8120== 272 bytes in 1 blocks are possibly lost in loss record 41 of 88 ==8120== at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8120== by 0x4012034: _dl_allocate_tls (dl-tls.c:297) ==8120== by 0x55EEABC: pthread_create@@GLIBC_2.2.5 (allocatestack.c:571) ==8120== by 0x4012B9: main (encmain.c:404) ==8120== ==8120== 544 bytes in 2 blocks are possibly lost in loss record 45 of 88 ==8120== at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8120== by 0x4012034: _dl_allocate_tls (dl-tls.c:297) ==8120== by 0x55EEABC: pthread_create@@GLIBC_2.2.5 (allocatestack.c:571) ==8120== by 0x4E5EF65: kvz_threadqueue_init (threadqueue.c:308) ==8120== by 0x4E3BD2F: kvz_encoder_control_init (encoder.c:173) ==8120== by 0x4E4DD7E: kvazaar_open (kvazaar.c:80) ==8120== by 0x401112: main (encmain.c:346) ==8120== ==8120== 53,856 bytes in 1 blocks are possibly lost in loss record 81 of 88 ==8120== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8120== by 0x4E46BEC: kvz_image_alloc (image.c:59) ==8120== by 0x401E12: input_read_thread (encmain.c:183) ==8120== by 0x55EDE99: start_thread (pthread_create.c:308) ==8120== ==8120== LEAK SUMMARY: ==8120== definitely lost: 112 bytes in 1 blocks ==8120== indirectly lost: 0 bytes in 0 blocks ==8120== possibly lost: 54,672 bytes in 4 blocks ==8120== still reachable: 1,265,980 bytes in 563 blocks ==8120== suppressed: 0 bytes in 0 blocks ==8120== Reachable blocks (those to which a pointer was found) are not shown. ==8120== To see them, rerun with: --leak-check=full --show-reachable=yes ==8120== ==8120== For counts of detected and suppressed errors, rerun with: -v ==8120== Use --track-origins=yes to see where uninitialised values come from ==8120== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 2 from 2)	2016-09-29 00:21:03 +03:00
Ari Koivula	31c5ff0f16	Add cross-platform core number detection Well, turns out pthread_num_processors_np isn't standard so we need to do this crap. Threw in hyper threading detection as a bonus.	2016-09-29 00:03:21 +03:00
Ari Koivula	8c7351eac8	Fix lp-gop with depth 1 GOPs with depth 1 had the same structure as those with depth 2: g4d3t1 = 3 2 3 1 g4d2t1 = 2 2 2 1 g4d1t1 = 2 2 2 1 It now results in the correct: g4d1t1 = 1 1 1 1	2016-09-29 00:03:21 +03:00
Ari Koivula	a395aeaac9	Set default settings to those of --preset=medium	2016-09-29 00:03:21 +03:00
Ari Koivula	4388fe0d30	Set presets to ratedistortion-complexity optimized versions	2016-09-29 00:03:20 +03:00
Ari Koivula	facb1e16df	Use -p64 -q22 and --gop=lp-g4d3t1 by default Coding inter without GOP of any kind really isn't a very sensible default. Defaulting to B-GOP of some kind would be more better, but lp-gop is more robust for now.	2016-09-29 00:03:20 +03:00
Ari Koivula	d7391a9593	Improve default for number of parallel frames	2016-09-29 00:03:20 +03:00
Ari Koivula	19d423ab29	Use all available cores by default	2016-09-29 00:03:20 +03:00
Ari Koivula	3f138f087a	Allow non-gop-length --period for lp-gop	2016-09-29 00:03:19 +03:00
Ari Koivula	16790c9f15	Remove number of references from --gop=lp syntax The number of references should be part of the presets, so gop should be defined separately.	2016-09-29 00:03:19 +03:00
Ari Koivula	cbfa824d1a	Merge branch 'simd'	2016-09-27 20:49:45 +03:00
Ari Koivula	14a7bcba25	Use a faster function for clipped inter SAD Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes for which we don't have AVX versions yet. Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for: --preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp * Suite speed_tests: -PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec) -PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec) +PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)	2016-09-27 20:48:30 +03:00
Arttu Ylä-Outinen	4313e56c2d	Add --no-rdoq-skip command line switch	2016-09-11 17:40:16 +09:00
Ari Koivula	19caa1e574	Update README and man page	2016-09-10 21:06:07 +03:00
Ari Koivula	a7a33b08ec	Remove --slice-addresses from usage message And give a warning if it's used. Slices will have to be implemented at some point, but they aren't yet so let's not advertize them.	2016-09-10 21:06:00 +03:00
Eemeli Kallio	f41e428e5f	Removed kvz_skip_unnecessary_rdoq and reworked --rdoq-skip to skip 4x4 blocks when it is on.	2016-09-09 10:26:07 +03:00
Eemeli Kallio	ed9c0b0416	RDOQ reworked in rdo.c. rdoq_signhide now skips coeffs that are after best_last_idx.	2016-09-09 10:16:51 +03:00
Ari Koivula	17f3f6bc86	Add clipped test cases to inter speed tests Add tests for the extreme shapes that can happen when a motion vector points outside the frame. A single pixel case where it probably doesn't make sense to call a vectorized function, and the maximum size where it definitely does make sense to call a vectorized function.	2016-09-01 23:08:16 +03:00
Ari Koivula	02cd17b427	Add faster AVX inter SAD for 32x32 and 64x64 Add implementations for these functions that process the image line by line instead of using the 16x16 function to process block by block. The 32x32 is around 30% faster, and 64x64 is around 15% faster, on Haswell. PASS inter_sad: 28.744M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec) PASS inter_sad: 7.882M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec) to PASS inter_sad: 37.828M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec) PASS inter_sad: 9.081M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)	2016-09-01 21:36:39 +03:00

1 2 3 4 5 ...

2266 commits