Commit graph

2527 commits

Author SHA1 Message Date
Arttu Ylä-Outinen db5e750c7f Fix --threads=auto
When --threads=auto was given on the command line, cfg->threads was
actually set to zero, disabling threads altogether. Fixed to set
cfg->threads to -1, so that the number of threads is chosen
automatically.
2017-01-08 17:58:22 +09:00
Ari Koivula a9e45efcfc Add a fast lane for byte-aligned bitstream writes
The CABAC engine only writes to the bitstream when it has a full byte.
These writes are also always byte-aligned, so there is no need to even
check for stream alignment.

Speedup was around 3% with ultrafast and low QP.
2016-12-23 17:01:44 +02:00
Jaakko Laitinen deb63f735f Fix gop disabling 2016-12-20 14:25:13 +02:00
Ari Lemmetti 70a52f0e48 10-bit: add missing bit depth adjustment to ssd 2016-11-17 19:28:04 +02:00
Ari Koivula fa078102f1 Fix 32bit compilation
Got a warning about implicit cast from uint64_t to void*.
2016-11-17 17:53:57 +02:00
Ari Koivula 5ceec06bd3 Merge pull request #148 from Venti-/crypto
Crypto
2016-11-16 21:33:55 +02:00
Ari Lemmetti c31207ea7d Optimize intra reference building
-Add function with reduced logic for the most common case
2016-11-16 18:28:42 +02:00
Ari Lemmetti 02c9e3746c Add AppVeyor badge 2016-11-16 17:12:36 +02:00
Ari Koivula 24f2a23ef8 Remove unnecessary crypto state
The frame does not need it's own crypto state, since it always has at
least one sub tile.
2016-11-16 13:58:41 +02:00
Ari Koivula 8951e34fd2 Change crypto.h stubs to print instead of assert 2016-11-16 13:58:41 +02:00
Wassim Hamidouche ea82c38906 correct memory allocation 2016-11-16 12:35:28 +02:00
Wassim Hamidouche da3e2d1d07 resolve parallel encryption 2016-11-16 12:35:28 +02:00
Ari Koivula b8a618e666 Fix problems with >8 bit input
Enforce bit depth promised by --input-bitdepth to avoid crashes when
larger values are provided.

Do endianess byte swap for all bytes when the buffer gets extended
to multiple of 8 pixels, and not just the number of input pixels.

Don't swap bytes on a little-endian system.
2016-11-13 19:58:54 +02:00
Ari Koivula 2c005cda25 Fix bug with sub-pixel motion estimation in tiles
The width of the tile was being used to index the frame pixel buffer
instead of the width of the buffer.
2016-11-07 15:53:52 +02:00
Ari Koivula bb33cd3901 Update lp-gop syntax in README 2016-11-04 17:22:24 +02:00
Ari Koivula 78a28e0338 Reformat --help message
- Reduce indentation to 6 spaces
- Word wrap everything to under 80 characters
- Remove defaults from options covered by presets
- Add a dash in front of argument descriptions
- Add --(no-) to names of parameters that accept it and remove mention
  of enabling or disabling
- Add executable and scripts as a dependancy to make docs
2016-11-04 15:40:28 +02:00
Ari Koivula 98a0d54b70 Merge branch 'dts-fix' 2016-10-28 19:06:22 +03:00
Ari Koivula d18de19d8a Fix DTS and PTS not being passed on through lib API
Fixes "cur_dts is invalid" warning from FFmpeg.
2016-10-28 19:05:47 +03:00
Ari Koivula 0c41c2ebd6 Make CLI set PTS for each input picture
This value is not represented in the HEVC bitstream, which is why it
was not set previously. FFmpeg sets and needs it however, so make the
CLI set it as well to make sure we handle it correctly.
2016-10-28 19:03:03 +03:00
Ari Koivula c9cfe8d76b Merge branch 'help' 2016-10-27 03:32:22 +03:00
Ari Koivula c7da5e981b Update README and manpage 2016-10-27 03:29:53 +03:00
Ari Koivula 5bf745460d Re-categorize options in the help message
- Move VUI stuff to the bottom
- Merge Parallel processing, WPP, Tiles and slices
- Add more categories for the other options
2016-10-27 03:26:15 +03:00
Ari Koivula cb6672b452 Disable WPP when Tiles are enabled
Closes #142.
2016-10-27 02:07:10 +03:00
Ari Koivula 4990b0d528 Merge pull request #145 from darealshinji/patch-1
Bump KVZ_VERSION
2016-10-25 19:42:19 +03:00
Ari Koivula 6a162f3bc5 Merge pull request #144 from wiiaboo/appveyor
Add appveyor scripts to test with MSYS2
2016-10-25 19:41:01 +03:00
darealshinji 488d042e5f Bump KVZ_VERSION 2016-10-25 12:32:13 +02:00
Ricardo Constantino e269b86539
Add appveyor scripts to test with MSYS2 2016-10-21 15:39:29 +01:00
Ari Lemmetti 29153ed503 Remove unused variable 2016-10-21 17:28:42 +03:00
Ari Lemmetti a1390ca3c0 Merge branch 'ssd-avx2' 2016-10-21 15:08:44 +03:00
Ari Lemmetti 778e46dfd8 Add AVX2 version of SSD 2016-10-21 15:07:53 +03:00
Ari Lemmetti 6f5d7c9e06 Move SSD to strategies 2016-10-21 15:07:23 +03:00
Ari Lemmetti 89b941eab4 Fix typo 2016-10-21 15:07:02 +03:00
Ari Koivula bfdd492c9f Merge pull request #141 from aballier/multilib
Include i386 & i486 for compiling intel asm.
2016-10-19 21:19:25 +03:00
Alexis Ballier 1dcc993743 Include i386 & i486 for compiling intel asm.
x86_64-pc-linux-gnu-gcc -m32 that I use for building 32bits libraries on amd64 defines only __i386__.
2016-10-14 18:07:37 +02:00
Arttu Ylä-Outinen 8ae791a3e1 Fix building with crypto++
Depending on the distro, the pkg-config package name of crypto++ could
be either cryptopp or libcrypto++. This commit changes configure to
check for both instead of cryptopp only.
2016-10-10 15:13:20 +09:00
Arttu Ylä-Outinen e7cdd47745 Merge branch 'implicit-rdpcm' 2016-10-03 20:04:00 +09:00
Arttu Ylä-Outinen 5fb7afe8c4 Add --implicit-rdpcm command line parameter.
Makes it possible to use lossless coding without implicit residual DPCM.
2016-10-03 20:01:55 +09:00
Arttu Ylä-Outinen 5affc0f527 Use implicit RDPCM in lossless mode.
Sets implicit RDPCM flag in SPS when lossy coding is disabled and
applies DPCM to intra residual when prediction mode is horizontal or
vertical.
2016-10-03 19:31:38 +09:00
Arttu Ylä-Outinen c418db660b Update preset table in README.md 2016-10-02 20:11:38 +09:00
Ari Koivula 23dc9a0ada Allow osx to fail on Travis 2016-09-29 17:39:28 +03:00
Ari Koivula 5f5fffb8b5 Merge branch 'new_presets'
Significant boost to either BDRate, speed or both for every preset.
2016-09-29 17:36:45 +03:00
Ari Koivula 016dbe0894 Further refine presets
The rd-complexity of slow presets is better with a less agressive GOP.

Adding the GOP as part of the preset improved BDRate enough, that it
didn't make sense anymore to have a veryslow target the best BDRate.
Instead, push that responsibility to placebo by making it a little bit
faster.
2016-09-29 17:35:12 +03:00
Ari Koivula 278cd4da9b Disable WPP in Travis tile tests
Now that WPP is on by default, Valgrind is finding memory leaks on
these tests. It's not a priority so I'll just disable it for now.

==8120== Memcheck, a memory error detector
==8120== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==8120== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==8120== Command: /home/travis/build/Venti-/kvazaar/src/.libs/lt-kvazaar -i mandelbrot_264x130.yuv --input-res=264x130 -o test.265 -p4 -r2 --owf=1 --threads=2 --tiles-height-split=u2 --rd=0 --no-rdoq --no-deblock --no-sao --no-signhide --subme=0 --pu-depth-inter=1-3 --pu-depth-intra=2-3
==8120==
Disabling TMVP because tiles are used.
Compiled: INTEL, flags: MMX SSE SSE2
Detected: INTEL, flags: MMX SSE SSE2 SSE3 SSSE3 SSE41 SSE42
Available: sse2(2) sse41(1)
In use: sse2(1) sse41(1)
Input: mandelbrot_264x130.yuv, output: test.265
  Video size: 264x136 (input=264x130)
==8120== Conditional jump or move depends on uninitialised value(s)
==8120==    at 0x4E5FEE5: kvz_threadqueue_job_dep_add (threadqueue.c:616)
==8120==    by 0x4E3DEAB: encoder_state_worker_encode_children (encoderstate.c:432)
==8120==    by 0x4E3E219: encoder_state_encode (encoderstate.c:649)
==8120==    by 0x4E3DE35: encoder_state_worker_encode_children (encoderstate.c:417)
==8120==    by 0x4E3E219: encoder_state_encode (encoderstate.c:649)
==8120==    by 0x4E3DE35: encoder_state_worker_encode_children (encoderstate.c:417)
==8120==    by 0x4E3E219: encoder_state_encode (encoderstate.c:649)
==8120==    by 0x4E3ECBD: kvz_encode_one_frame (encoderstate.c:941)
==8120==    by 0x4E4DA22: kvazaar_encode (kvazaar.c:229)
==8120==    by 0x4E4E228: kvazaar_field_encoding_adapter (kvazaar.c:280)
==8120==    by 0x40137F: main (encmain.c:436)
==8120==
lt-kvazaar: threadqueue.c:618: kvz_threadqueue_job_dep_add: Assertion `job && depends_on' failed.
==8120==
==8120== HEAP SUMMARY:
==8120==     in use at exit: 1,320,764 bytes in 568 blocks
==8120==   total heap usage: 584 allocs, 16 frees, 1,330,691 bytes allocated
==8120==
==8120== 112 bytes in 1 blocks are definitely lost in loss record 27 of 88
==8120==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8120==    by 0x4E46BA5: kvz_image_alloc (image.c:49)
==8120==    by 0x401E12: input_read_thread (encmain.c:183)
==8120==    by 0x55EDE99: start_thread (pthread_create.c:308)
==8120==
==8120== 272 bytes in 1 blocks are possibly lost in loss record 41 of 88
==8120==    at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8120==    by 0x4012034: _dl_allocate_tls (dl-tls.c:297)
==8120==    by 0x55EEABC: pthread_create@@GLIBC_2.2.5 (allocatestack.c:571)
==8120==    by 0x4012B9: main (encmain.c:404)
==8120==
==8120== 544 bytes in 2 blocks are possibly lost in loss record 45 of 88
==8120==    at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8120==    by 0x4012034: _dl_allocate_tls (dl-tls.c:297)
==8120==    by 0x55EEABC: pthread_create@@GLIBC_2.2.5 (allocatestack.c:571)
==8120==    by 0x4E5EF65: kvz_threadqueue_init (threadqueue.c:308)
==8120==    by 0x4E3BD2F: kvz_encoder_control_init (encoder.c:173)
==8120==    by 0x4E4DD7E: kvazaar_open (kvazaar.c:80)
==8120==    by 0x401112: main (encmain.c:346)
==8120==
==8120== 53,856 bytes in 1 blocks are possibly lost in loss record 81 of 88
==8120==    at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8120==    by 0x4E46BEC: kvz_image_alloc (image.c:59)
==8120==    by 0x401E12: input_read_thread (encmain.c:183)
==8120==    by 0x55EDE99: start_thread (pthread_create.c:308)
==8120==
==8120== LEAK SUMMARY:
==8120==    definitely lost: 112 bytes in 1 blocks
==8120==    indirectly lost: 0 bytes in 0 blocks
==8120==      possibly lost: 54,672 bytes in 4 blocks
==8120==    still reachable: 1,265,980 bytes in 563 blocks
==8120==         suppressed: 0 bytes in 0 blocks
==8120== Reachable blocks (those to which a pointer was found) are not shown.
==8120== To see them, rerun with: --leak-check=full --show-reachable=yes
==8120==
==8120== For counts of detected and suppressed errors, rerun with: -v
==8120== Use --track-origins=yes to see where uninitialised values come from
==8120== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 2 from 2)
2016-09-29 00:21:03 +03:00
Ari Koivula 31c5ff0f16 Add cross-platform core number detection
Well, turns out pthread_num_processors_np isn't standard so we need to
do this crap. Threw in hyper threading detection as a bonus.
2016-09-29 00:03:21 +03:00
Ari Koivula 8c7351eac8 Fix lp-gop with depth 1
GOPs with depth 1 had the same structure as those with depth 2:
g4d3t1 = 3 2 3 1
g4d2t1 = 2 2 2 1
g4d1t1 = 2 2 2 1

It now results in the correct:
g4d1t1 = 1 1 1 1
2016-09-29 00:03:21 +03:00
Ari Koivula a395aeaac9 Set default settings to those of --preset=medium 2016-09-29 00:03:21 +03:00
Ari Koivula 4388fe0d30 Set presets to ratedistortion-complexity optimized versions 2016-09-29 00:03:20 +03:00
Ari Koivula facb1e16df Use -p64 -q22 and --gop=lp-g4d3t1 by default
Coding inter without GOP of any kind really isn't a very sensible
default. Defaulting to B-GOP of some kind would be more better,
but lp-gop is more robust for now.
2016-09-29 00:03:20 +03:00
Ari Koivula d7391a9593 Improve default for number of parallel frames 2016-09-29 00:03:20 +03:00
Ari Koivula 19d423ab29 Use all available cores by default 2016-09-29 00:03:20 +03:00