Commit graph

421 commits

Author SHA1 Message Date
Reima Hyvönen 1fcc5c6a8d Merge branch 'bipred_recon' 2018-12-11 09:59:35 +02:00
Reima Hyvönen e4a10880f3 Added case 12 to bipred_recon no mov 2018-12-11 09:52:17 +02:00
Marko Viitanen a4f3968e52 Fix Visual Studio errors by initializing some variables used in AVX2 signhiding 2018-12-11 09:33:26 +02:00
Pauli Oikkonen c465578048 Add a descriptive comment to coefficient reordering 2018-12-03 15:36:32 +02:00
Pauli Oikkonen f78bf2ebcb Optimize q_coefs usage for indexed fetch 2018-12-03 15:36:32 +02:00
Pauli Oikkonen d9591f1b49 Eliminate midway buffering of reordered coefs
TODO: For some mysterious reason seems slightly slower than the
buffered one
2018-12-03 15:36:32 +02:00
Pauli Oikkonen 7fe454c51f Optimize get_cheapest_alternative() 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 6bbd3e5a44 Optimize rearrange_512 function 2018-12-03 15:36:32 +02:00
Pauli Oikkonen cb8209d1b3 Vectorize transform coefficient reordering loop 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 7cf4c7ae5f Rename "reduce" functions to hsum
That's what the functions fundamendally do anyway
2018-12-03 15:36:32 +02:00
Pauli Oikkonen 316cd8a846 Fix ALIGNED keyword and grow alignment to 64B 2018-12-03 15:36:32 +02:00
Pauli Oikkonen 1befc69a4c Implement sign bit hiding in AVX2 2018-12-03 15:36:32 +02:00
Reima Hyvönen f8696b54a4 Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12) 2018-11-20 17:09:19 +02:00
Reima Hyvönen 710ba288db Chroma has some problems 2018-11-15 16:42:48 +02:00
Ari Lemmetti a832206bb6 Replace 32-bit incompatible instrinsics 2018-11-12 18:54:33 +02:00
Ari Lemmetti 5c774c4105 Rewrite most of FME and interpolation filters
Changes had to break a lot of stuff and were just squashed into this horrible code dump
2018-11-08 20:21:16 +02:00
Reima Hyvönen 7406c33a42 Some more cleaning 2018-10-26 12:25:18 +03:00
Reima Hyvönen 4c71546b2e Cleaned some coding 2018-10-26 12:19:44 +03:00
Reima Hyvönen 4fe3909e48 Switched luma to use 32bits size ints intstead of 16bit size 2018-10-24 18:24:46 +03:00
Marko Viitanen 465bc2cfee [EMT] make functions static and prefix arrays with kvz_g 2018-10-18 10:54:33 +03:00
Marko Viitanen 169febd1c4 [EMT] Simplify DCT8, DCT5, DST1 and DST7 definitions 2018-10-17 12:17:54 +03:00
Marko Viitanen e015d7eb2b Fix compiler warnings 2018-10-17 10:43:11 +03:00
Marko Viitanen ad310c77d3 Added EMT transforms to the strategies 2018-10-17 08:56:49 +03:00
Reima Hyvönen 381e786e10 Trying to find the bug in luma 2018-10-11 18:08:41 +03:00
Reima Hyvönen 2f5f81bac3 removed the non-optimated bipred function 2018-10-09 11:19:23 +03:00
Reima Hyvönen 212a8e68fa Modified to avoid memory overflow, still some bug inside luma 2018-10-02 20:23:32 +03:00
Marko Viitanen 389aeebe07 Added 2x2 transform functions 2018-09-13 14:51:07 +03:00
Marko Viitanen 445c059b4a Fix transforms for VTM 2.0, generated new transform matrices and added a shift by 2 for forward and inverse 2018-09-13 14:39:49 +03:00
Marko Viitanen 382917bcd3 New table for choosing angular intra filtered references and a small bugfix on the end condition of angular intra 2018-09-13 09:35:55 +03:00
Marko Viitanen d4ed0ee3ad Fixed some array offsets in intra angular prediction 2018-09-12 08:53:17 +03:00
Sami Ahovainio 787264f568 Fixed dst indexing in kvz_angular_pred_generic 2018-08-31 10:36:28 +03:00
Sami Ahovainio d2291fea83 Intra mode scaling moved from angular prediction to kvz_intra_predict. pdpc implemented in kvz_intra_predict. 2018-08-31 10:01:28 +03:00
Sami Ahovainio 54ebadfc43 Clarifying comments and changes towards WAIP 2018-08-29 16:00:08 +03:00
Reima Hyvönen 896034b7cf Some renamed functions back 2018-08-28 15:31:10 +03:00
Reima Hyvönen e8b5e6db4c Did some merging 2018-08-28 15:26:27 +03:00
Reima Hyvönen 7de5c74434 Updated bipred_recon to work faster 2018-08-28 15:12:31 +03:00
Reima Hyvönen 47b357cca2 Comment one test 2018-08-27 18:52:14 +03:00
Reima Hyvönen 2ca99a44e8 Updated shuffle operation to be in right order 2018-08-27 18:16:38 +03:00
Sami Ahovainio 42741a2c40 Some changes for PCM and Intra towards VTM 2.0 compatibility. 2018-08-27 09:18:15 +03:00
Marko Viitanen 4f7da86285 Commented out sign hiding code, which is not used in VVC 2018-08-17 09:38:11 +03:00
Marko Viitanen c9cbdd5dc3 Added couple of ToDo comments for large CTU support 2018-08-17 09:37:14 +03:00
Marko Viitanen daf041406f Disable DST 2018-08-16 16:05:32 +03:00
Reima Hyvönen 508b218a12 some modifications made to prevent reading too much 2018-08-14 10:50:39 +03:00
Reima Hyvönen 1d935ee888 some useless stuff removed 2018-08-13 16:47:11 +03:00
Reima Hyvönen ce3ac4c05e some modifications to no_mov 2018-08-13 16:41:02 +03:00
Reima Hyvönen 15a613ae94 test if no_mov breaks testing 2018-08-13 16:02:56 +03:00
Reima Hyvönen 97a2049e58 removed pointer declaration out from switch 2018-08-10 16:42:26 +03:00
Reima Hyvönen aa94bcedbc Stream is now pointer 2018-08-10 16:38:49 +03:00
Reima Hyvönen fa5b227ece 256 to 32 doesn't work, made them by hand 2018-08-10 16:01:20 +03:00
Reima Hyvönen 408dedbcc8 removed _mm256_extract_epi8 and replaced with _mm_stream 2018-08-10 15:53:26 +03:00
Reima Hyvönen 31c35091c6 _mm256_cvtsi256_si32 removed 2018-08-10 10:06:40 +03:00
Reima Hyvönen 99dc43074f _mm256_cvtsi256_si32 breaks system, too much bits. back to extract 2018-08-10 09:59:33 +03:00
Reima Hyvönen 4f1f80b2cb Transformed convert from 256 to cast 256 -> 128 and then convert from 128 2018-08-09 15:35:54 +03:00
Reima Hyvönen 4957555eb3 Removed leftover from 939 2018-08-09 15:25:03 +03:00
Reima Hyvönen 28b165c971 Clearified some sections, added _MM_SHUFFLE macro 2018-08-09 15:23:01 +03:00
Reima Hyvönen dd04df8667 testing if error in both avx2 functions 2018-08-03 11:49:00 +03:00
Reima Hyvönen ed50d71fde Switched some variables to different location, altered inter_recon_bipred_avx2 function 2018-08-02 16:08:59 +03:00
Reima Hyvönen f5739a0028 Renaming and removing useless prints 2018-08-02 14:47:17 +03:00
Reima Hyvönen bc09f59bb6 Edited some definitions 2018-08-02 11:54:53 +03:00
Reima Hyvönen a4bf77f208 Tested some extract functions 2018-07-12 09:29:32 +03:00
Reima Hyvönen c05033a893 Even more useless vectors removed 2018-07-11 15:09:14 +03:00
Reima Hyvönen 884cb77238 Removed some not used vectors 2018-07-11 15:06:11 +03:00
Reima Hyvönen 792689a5ff Removed for-loops, added extract instead 2018-07-11 14:56:41 +03:00
Reima Hyvönen f9c7f6ee66 Added some break-operations for avx2 optimation 2018-07-11 14:15:38 +03:00
Reima Hyvönen cc064da143 some more optimation for bipred 2018-07-11 11:27:54 +03:00
Reima Hyvönen a22cf03ddb Updated to have no movement function to avx2 strategies 2018-07-10 16:07:15 +03:00
Reima Hyvönen ea83ae45f0 Toimiva ratkaisu 2018-07-03 11:18:51 +03:00
Reima Hyvönen 17babfffa4 25.6 working optimation, ~50% faster than original 2018-06-25 17:06:16 +03:00
Reima Hyvönen 9fed29f950 optimation for inter_recon_bipred 2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen 0a69e6d18f Fix selection of transform function for 4x4 blocks
DST function was returned for inter luma transform blocks of size 4x4
even though they must use DCT. Fixed by checking the prediction mode of
the block in addition to whether it is chroma or luma.
2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen 9694bd2fae Fix build on 32-bit systems
Function coeff_abs_sum_avx2 that was added in e950c9b was outside the
AVX2 #if directive.
2017-07-28 09:19:29 +03:00
Arttu Ylä-Outinen e950c9b101 Add AVX2 implementation for coefficient sum 2017-07-28 07:39:36 +03:00
Arttu Ylä-Outinen d50ae6990c Add sum of absolute coefficients to strategies 2017-07-28 07:39:15 +03:00
Arttu Ylä-Outinen fdb3480b54 Enable strategies for SAO reconstruction
Re-enables strategies for SAO reconstruction. They were disabled in
commit ec9ff42.
2017-07-11 10:35:18 +03:00
Arttu Ylä-Outinen 333dba3884 Add static to SAO strategies 2017-07-11 10:02:01 +03:00
Arttu Ylä-Outinen 563bc26e71 Fix out-of-bounds read in AVX2 SAO
AVX2 version of SAO loaded offsets with a 256 bit read even though there
are only five 32 bit integers.
2017-07-06 13:04:52 +03:00
Arttu Ylä-Outinen 2c66e0bbd2 Fix warnings about invalid reads in AVX2 ipol
AVX2 filter functions read pixels in chunks of 8 or 16 bytes. At the end
of the block, the read goes out of the bounds of the pixels array. The
extra pixels do not affect the result.

Fixes valgrind complaining about the invalid reads by allocating 5 extra
pixels in kvz_get_extended_block_avx2
2017-06-22 09:37:55 +03:00
Arttu Ylä-Outinen 95775a1645 Change coefficient storage order
Changes coefficient storage order to a zig-zag order. Reduces
unnecessary copying of coefficients to temporary arrays.
2017-05-12 16:46:57 +03:00
Arttu Ylä-Outinen 51786eda67 Drop redundant fields in encoder_control_t
Some of the fields in encoder_control_t were simply copies of the
corresponding fields in kvz_config. This commit drops the copied fields
in favor of using the fields in encoder_control_t.cfg directly.
2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen e78a8dfcf5 Copy the kvz_config passed to encoder_open
The kvz_config struct is created by the user but kvazaar keeps a pointer
to it. It is easy to break things by modifying the configuration outside
kvazaar. In addition, kvazaar modifies the struct even though it is has
a const modifier.

This commit changes the field cfg in encoder_control_t to be a copy of
the kvz_config struct instead of a pointer, removing modifications to
the const struct and allowing users to do whatever they want with it
after opening the encoder.
2017-02-09 13:23:54 +09:00
Arttu Ylä-Outinen 640ff94ecd Use separate lambda and QP for each LCU
Adds fields lambda, lambda_sqrt and qp to encoder_state_t. Drops field
cur_lambda_cost_sqrt from encoder_state_config_frame_t and renames
cur_lambda_cost to lambda.
2017-01-09 01:24:23 +09:00
Ari Lemmetti 70a52f0e48 10-bit: add missing bit depth adjustment to ssd 2016-11-17 19:28:04 +02:00
Ari Lemmetti 29153ed503 Remove unused variable 2016-10-21 17:28:42 +03:00
Ari Lemmetti 778e46dfd8 Add AVX2 version of SSD 2016-10-21 15:07:53 +03:00
Ari Lemmetti 6f5d7c9e06 Move SSD to strategies 2016-10-21 15:07:23 +03:00
Ari Lemmetti 89b941eab4 Fix typo 2016-10-21 15:07:02 +03:00
Ari Koivula cbfa824d1a Merge branch 'simd' 2016-09-27 20:49:45 +03:00
Ari Koivula 14a7bcba25 Use a faster function for clipped inter SAD
Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes
for which we don't have AVX versions yet.

Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for:
--preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp

* Suite speed_tests:
-PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
-PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
2016-09-27 20:48:30 +03:00
Eemeli Kallio f41e428e5f Removed kvz_skip_unnecessary_rdoq and reworked --rdoq-skip to skip 4x4 blocks when it is on. 2016-09-09 10:26:07 +03:00
Ari Koivula 02cd17b427 Add faster AVX inter SAD for 32x32 and 64x64
Add implementations for these functions that process the image line by
line instead of using the 16x16 function to process block by block.

The 32x32 is around 30% faster, and 64x64 is around 15% faster,
on Haswell.

PASS inter_sad: 28.744M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec)
PASS inter_sad: 7.882M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)
to
PASS inter_sad: 37.828M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec)
PASS inter_sad: 9.081M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)
2016-09-01 21:36:39 +03:00
Ari Lemmetti 28c4174d0e Fix incorrect shuffle parameters
_MM_SHUFFLE uses reverse order
2016-08-23 19:40:46 +03:00
Ari Lemmetti ce77bfa15b Replace KVZ_PERMUTE with _MM_SHUFFLE
The same exact macro already exists
2016-08-22 19:08:46 +03:00
Eemeli Kallio 99d8b9abeb Changed skip_rdoq name to kvz_skip_unnecessary_rdoq. Changed the order it uses when it goes through CGs and tuned its sum calculation. 2016-08-18 14:02:56 +03:00
Eemeli Kallio 1fb4755f31 Added rdoq-skip to quant-generic.c 2016-08-18 12:17:54 +03:00
Eemeli Kallio d20ac03ca2 Added --rdoq-skip option 2016-08-18 12:17:53 +03:00
Arttu Ylä-Outinen 2a946bd88e Rename encoder_state_t.global to frame
"Frame" is more accurate than "global" since when OWF is used, encoder
states for each frame have their own struct.
2016-08-10 13:22:36 +09:00
Arttu Ylä-Outinen 5fbb0a8c27 Fix includes 2016-08-10 13:05:40 +09:00
Ari Lemmetti 6bcba004ff Comment out to fix unused code error on clang. 2016-07-14 14:12:16 +03:00
Ari Lemmetti c0979ebdcb Implement AVX2 luma sampling 2016-07-14 12:53:02 +03:00
Ari Lemmetti 6244560426 Add avx2 strategy for kvz_filter_frac_blocks_luma. 2016-07-14 12:53:02 +03:00
Ari Lemmetti 9c4e9e049b Load only what is needed. Eliminate latency from hadds. 2016-07-14 12:53:01 +03:00
Ari Lemmetti fccfbd2f28 Add strategy for kvz_filter_frac_blocks_luma 2016-07-14 12:51:02 +03:00
Ari Lemmetti 2b0c8db349 Add quad satd for avx2 2016-07-14 12:50:24 +03:00
Ari Lemmetti 0ff69fd6f8 Add any size multi satd 2016-07-14 12:48:37 +03:00
Arttu Ylä-Outinen bf26661782 Add support for 4x4 blocks to SATD_ANY_SIZE.
Makes functions satd_any_size_generic and satd_any_size_8bit_avx2 work
on blocks whose width and/or height are not multiples of 8.
2016-06-16 18:53:17 +09:00
Ari Lemmetti 3107a93eaf Fix avx2 chroma sampling for amp 2016-05-17 14:09:57 +03:00
Ari Lemmetti efbdc5dade Utilize registers more efficiently for 8x8 and larger blocks 2016-04-21 13:26:38 +03:00
Ari Lemmetti 192cee95b2 Vectorize vertical filtering 2016-04-21 13:26:38 +03:00
Ari Lemmetti 0be35f72b8 Filter 4 pixels simultaneously in x direction 2016-04-21 13:26:38 +03:00
Ari Lemmetti 10484bda9f Make strategies out of fractional pixel sample functions 2016-04-21 13:26:38 +03:00
Ari Lemmetti 8247faf8e0 Remove 64-bit only instruction to fix 32-bit compilation. 2016-04-19 18:05:11 +03:00
Ari Lemmetti eb55d6b6b9 Fix writing over boundary. 2016-04-19 16:03:43 +03:00
Ari Lemmetti bcabc6fadd Remove pixel blit from strategies. Use memcpy instead. 2016-04-06 18:44:04 +03:00
Ari Koivula 61fc3e87ba Run include-what-you-use fix_includes.py fix_includes.py
The includes should make more sense now and not just happen to compile
due to headers included from other headers.

Used a modified version of IWYU. Modifications were to attribute int8_t
and so on to stdint.h instead of sys/types.h and immintrin.h instead of
more specific headers.

include-what-you-use 0.7 (git:b70df35)
based on clang version 3.9.0 (trunk 264728)
2016-04-01 17:46:55 +03:00
Ari Koivula 8908d85d66 Change all relative includes to absolute 2016-04-01 17:46:44 +03:00
Ari Koivula 4876879b82 Add IWYU pragmas 2016-03-31 12:33:34 +03:00
Ari Koivula 5b66578f71 Add kvz_ prefix to md5 functions
The non kvz_ symbols were being exported in the static lib, which got caught
by Travis tests.
2016-03-18 13:13:35 +02:00
Ari Koivula 4125218cfa Add --hash=md5
Add md5 through extras/libmd5 taken from HM with BSD license. It's
implemented as a generic strategy using the same interface as checksum,
so we can write a SIMD version if it seems necessary.
2016-03-18 05:23:57 +02:00
Ari Lemmetti e502292ba8 Remove old function 2016-03-16 20:18:55 +02:00
Ari Lemmetti c6cc96f5ec Optimize sao band ddistortion 2016-03-16 20:16:00 +02:00
Ari Lemmetti ab577f476f Optimize sao reconstruct color 2016-03-16 20:15:32 +02:00
Ari Lemmetti 48bfddf4ec Optimize calc sao edge dir 2016-03-16 20:14:50 +02:00
Ari Lemmetti ba69992941 Optimize sao edge ddistortion 2016-03-16 20:14:19 +02:00
Ari Lemmetti 941b6b3e27 Optimize calc eo cat 2016-03-16 20:13:30 +02:00
Ari Lemmetti 04fbb48a09 Add strategy for avx2. Copy generic functions there. 2016-03-16 20:13:15 +02:00
Ari Lemmetti 4e30a215d8 Create generic strategy for sao. 2016-03-16 20:11:15 +02:00
Ari Lemmetti 99e37ec235 Update old pixel type to the current one 2016-01-30 19:33:09 +02:00
Ari Koivula fa1af14637 Fix includes to include global.h first everywhere 2016-01-22 15:07:49 +02:00
Ari Lemmetti 44656aeb19 Remove useless calculation 2016-01-19 16:35:16 +02:00
Ari Lemmetti a2fc9920e6 Merge branch 'alternative-satd' 2016-01-13 15:00:43 +02:00
Ari Lemmetti 1ed34f2df8 Add some planar pred optimization for blocks larger than 8x8 2016-01-13 14:50:17 +02:00
Ari Lemmetti 0df88697ff Copy generic function to AVX2 strategy 2016-01-12 23:51:18 +02:00
Ari Lemmetti 62799a9fc3 Create generic strategy of planar prediction 2016-01-12 23:50:47 +02:00
Ari Lemmetti 3cb1cebfe5 Add missing inlines 2016-01-12 23:03:31 +02:00
Ari Lemmetti 6a0b13b8b6 Remove unused functions 2016-01-12 22:55:37 +02:00
Ari Lemmetti 61155f0edd Add 128-bit version of the functions as well 2016-01-12 22:52:00 +02:00
Ari Lemmetti a6afb8a8f4 Small refactoring 2016-01-12 22:29:33 +02:00
Ari Lemmetti a756f6133a Manually unroll vertical Hadamard transform 2016-01-12 21:45:02 +02:00
Ari Lemmetti 66350aa20e Experiment with alternative implementation of FWHT 2016-01-11 16:25:56 +02:00
Ari Koivula 947bae24f9 Update Doxygen documentation
Add module information to all header files.

Update all header file documentations to briefly say what they are, and
to use the javadoc format so the brief actually gets included into the
doxygen documentation.

Remove \file from implementation files, in order to not repeat the info
from the header files.

Add files under strategies and tools to Doxygen and update the Doxygen
settings to be just plain better.

Make README be the main page of Doxygen documentation.
2015-12-17 14:05:50 +02:00
Arttu Ylä-Outinen 864c77f6eb Use kvz_satd_any_size in inter search.
Changes search_frac and kvz_search_cu_iter to use kvz_satd_any_size for
computing the SATDs instead of getting the SATD function with
kvz_pixels_get_satd_func.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 056fa09ba5 Add arbitrary-sized SATD functions.
Adds strategy satd_any_size for generic and AVX2. The satd_any_size
functions are implemented with macro SATD_ANY_SIZE defined in
strategies-picture.h.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 728a6abecc Extract macro SATD_NxN.
Combines definitions of macros SATD_NXN and SATD_NXN_AVX2 to macro
SATD_NxN and moves it to strategies-picture.h.
2015-12-15 11:21:44 +02:00
Arttu Ylä-Outinen 4402e251ae Fix kvz_get_extended_block functions.
The buffers allocated in functions kvz_get_extended_block_avx2 and
kvz_get_extended_block_generic were too small when the width of the
block was less than its height. Fixed to allocate correctly sized
buffers.
2015-12-15 11:21:43 +02:00
Ari Lemmetti b78460b02c Optimize another loop 2015-12-11 11:21:43 +02:00
Ari Lemmetti ee8c2d0218 Add 4x4 dual SATD for AVX2 2015-12-03 17:13:11 +02:00
Ari Lemmetti 00736fa708 Generate larger than 8x8 dual satd functions with macro 2015-12-03 17:13:11 +02:00
Ari Lemmetti bd3e1922cd Add AVX2 8x8 dual hadamard transform 2015-12-03 17:13:11 +02:00
Ari Lemmetti d575b94357 Implement generic functions for dual sad / satd 2015-12-03 17:13:11 +02:00
Ari Lemmetti 183ee53f47 Add alternative version of rough intra search.
Calculate two costs simultaneously to exploit larger SIMD registers.
Implementation for dual functions missing currently.
2015-12-03 17:12:38 +02:00
Arttu Ylä-Outinen 940ada4c0d Mark AVX2 intra filter functions as static.
Marks functions filter_4x4_avx2, filter_16x16_avx2 and filter_NxN_avx2
static as they are not used outside strategies/avx2/intra-avx2.
2015-11-09 12:48:20 +02:00
Ari Lemmetti fbd0596114 Merge branch 'avx2-pixels-blit' 2015-11-04 11:06:10 +02:00
Ari Lemmetti 57ea7d223b Pass SIMD registers to functions as pointers to fix 32-bit compilation in visual studio 2015-11-04 10:51:26 +02:00
Ari Lemmetti a3855652e9 Add AVX2 version with separate handling of basic blocks and strideless copy. 2015-11-04 10:07:25 +02:00
Ari Lemmetti 0816fbea2c Create generic strategy of blit function 2015-11-04 10:07:25 +02:00
Marko Viitanen 821d5c478b Added missing parameter to kvz_strategy_register_picture_generic() 2015-11-02 08:55:54 +02:00
Ari Lemmetti d71f1b5bd0 Disable incompatible optimizations for 32-bit version 2015-10-24 15:32:27 +03:00
Ari Lemmetti df995d85e8 Utilize AVX2 for dequantization. 2015-10-23 20:17:08 +03:00
Ari Lemmetti cf347e33c4 Move dequant to strategies. Copy generic to AVX2 as well. 2015-10-23 19:53:50 +03:00
Ari Lemmetti 47082738aa ...and the same tricks for quantized reconstruction 2015-10-23 19:44:38 +03:00
Ari Lemmetti 7961ba80d8 Add functions for bigger block sizes to calculate more residual simultaneously and reduce memory accesses 2015-10-23 19:11:56 +03:00
Ari Lemmetti 15edd5060d Load and store multiple elements simultaneously. Use 128-bit wide zero
test. *wip*
2015-10-23 17:03:16 +03:00
Ari Lemmetti b37cca87c8 Copy generic to avx2 2015-10-23 17:03:15 +03:00
Ari Lemmetti cad2ea9d6e Move quantize_residual to quant strategies. 2015-10-23 17:03:15 +03:00
Ari Lemmetti 0c63041ba7 Add filtering functions for different block sizes. Simplify logic a bit to reduce branching. Sorry for the large commit! 2015-10-23 16:54:15 +03:00
Ari Lemmetti 5af7a42ebe Enable AVX2 strategy. Add first version of optimizations. 2015-10-08 12:36:20 +03:00
Ari Lemmetti f4fe3dca5e Add AVX2 strategy. Copy generic implementation there. 2015-10-08 12:36:15 +03:00
Ari Lemmetti 54e8b346a3 Add intra strategy. Move angular prediction there. 2015-10-08 12:36:05 +03:00
Ari Lemmetti 38106afa50 Add AVX2 version of quantization. 2015-10-02 16:18:52 +03:00
Ari Lemmetti ef0ad292ef Add quantization strategy. 2015-10-02 16:17:02 +03:00
Ari Lemmetti 989cee1b04 Add 4x4 function as well 2015-10-01 22:14:56 +03:00
Ari Lemmetti 8b57b2bb1a Refactor SATD to inline most of the function. Replace full horizontal add with shuffle and regular packed add. 2015-10-01 21:29:25 +03:00
Ari Lemmetti 55da2a9958 Add intrinsic version of SATD for 8x8 and larger blocks 2015-10-01 19:42:22 +03:00
Ari Lemmetti d68fc4c41e Add header for common utilities to use with strategies. 2015-10-01 19:40:35 +03:00
Ari Koivula 9a23ae3d92 Resolve remaining Visual Studio warnings.
- Ignore most of them and fix the ones that can't be ignored.
2015-08-31 15:02:25 +03:00
Arttu Ylä-Outinen 3a10e9e3e0 Prefix all non-static symbols with "kvz_". 2015-08-26 13:02:28 +03:00
Arttu Ylä-Outinen bfe2b31cee Make generic satd functions static. 2015-08-26 12:10:27 +03:00
Ari Lemmetti 923f4a74d5 Fix filtering over limits 2015-08-17 17:39:56 +03:00
Ari Lemmetti 82cf4e8ff4 Output error messages to stderr 2015-08-17 15:01:46 +03:00
Ari Lemmetti 3da71b62bf Add checks if malloc fails 2015-08-17 15:01:46 +03:00
Ari Lemmetti 4718fe7fda Change variable names to match used convention 2015-08-17 15:01:46 +03:00
Ari Lemmetti 6a5eaf08de Rename extend_borders to get_extended_block. Add kvz_ prefix to type definition. 2015-08-17 15:01:46 +03:00
Ari Lemmetti d82582c37c Changes to extend border function.
Now outputs a pointer to a block with guaranteed padding for filtering.
Only generate extra pixels if samples are needed out of bounds.
Use memcpy otherwise.
2015-08-17 15:01:46 +03:00
Ari Lemmetti 5d96dbc6c0 Make strategy selection use bit depth given via parameter instead of excluding registration with defines 2015-08-12 13:33:38 +03:00
Ari Lemmetti 4122f36089 Prevent the registration of strategies that are incompatible when KVZ_BIT_DEPTH != 8
Remove unnecessary or misleading mentions of "8bit"
2015-08-12 11:29:53 +03:00
Ari Lemmetti 348d7780fc Remove third shift and offset from 14-bit sampling functions (change missing from rebase) 2015-08-11 15:06:16 +03:00
Marko Viitanen 8409317bd9 Fixed rebasing errors for 10bit branch 2015-08-11 14:56:45 +03:00
Marko Viitanen 6453a511d7 Scale SAD/SATD costs to match bit depth
Conflicts:
	src/image.c
2015-08-11 08:18:14 +03:00
Marko Viitanen 0304b6c412 Fixed luma interpolation filter when 10bit coding and some other minor fixes 2015-08-11 08:17:48 +03:00
Marko Viitanen 450b5e64ca Fixed overflow on generic ipol filters when 10bit encoding
Conflicts:
	src/strategies/generic/ipol-generic.c
2015-08-11 08:17:48 +03:00
Marko Viitanen 414ebe6101 Fixed checksum on bitdepth > 8 cases
Conflicts:
	src/nal.c
	src/nal.h
	src/strategies/generic/nal-generic.c
	src/strategies/strategies-nal.c
	src/strategies/strategies-nal.h
2015-08-11 08:14:35 +03:00
Marko Viitanen 57ab46f110 Small fixes all around to enable 10bit encoding
Conflicts:
	src/encmain.c
	src/encoder.c
	src/encoderstate.c
	src/global.h
2015-08-11 07:59:20 +03:00
Ari Lemmetti 5887c96991 Add and use 14bit reconstruction for fractional motion vectors with bipred 2015-08-10 18:45:29 +03:00
Ari Lemmetti 8b4a6c92da Add 14bit precision sample functions. 2015-08-10 18:02:06 +03:00
Ari Lemmetti b30f17d4b8 Add fractional pixel sampling for chroma 2015-08-10 17:55:37 +03:00
Ari Lemmetti 01f40ec104 Add fractional pixel sampling for luma 2015-08-10 17:51:48 +03:00
Ari Koivula 0c3c93d456 Optimize intra SAD intrinsics.
- Added 64x64 version for completeness.
- With the exception of 16x16, these were all slightly slower than the ASM
  versions, as measured by "kvazaar_test -s speed -t intra_sad", but now they
  are on par or slightly faster.
- None of these actually use any AVX2 intrinsics, and probably never will,
  unless someone adds an interface for doing more than one block at a time,
  in which case the non-destructive versions might come in handy.
2015-08-06 19:35:00 +03:00
Arttu Ylä-Outinen f7f17a060c Rename pixel_t to kvz_pixel. 2015-07-02 16:58:28 +03:00
Arttu Ylä-Outinen fab07d80da Rename macro BIT_DEPTH to KVZ_BIT_DEPTH. 2015-07-02 16:55:47 +03:00
Marko Viitanen 8ed5d06ebe Fixed compiler warnings caused by the bipred branch merge 2015-04-23 15:12:48 +03:00