Commit graph

304 commits

Author SHA1 Message Date
Reima Hyvönen 17babfffa4 25.6 working optimation, ~50% faster than original 2018-06-25 17:06:16 +03:00
Reima Hyvönen 9fed29f950 optimation for inter_recon_bipred 2018-04-18 15:25:44 +03:00
Arttu Ylä-Outinen 0a69e6d18f Fix selection of transform function for 4x4 blocks
DST function was returned for inter luma transform blocks of size 4x4
even though they must use DCT. Fixed by checking the prediction mode of
the block in addition to whether it is chroma or luma.
2018-01-18 10:36:25 +02:00
Arttu Ylä-Outinen 9694bd2fae Fix build on 32-bit systems
Function coeff_abs_sum_avx2 that was added in e950c9b was outside the
AVX2 #if directive.
2017-07-28 09:19:29 +03:00
Arttu Ylä-Outinen e950c9b101 Add AVX2 implementation for coefficient sum 2017-07-28 07:39:36 +03:00
Arttu Ylä-Outinen d50ae6990c Add sum of absolute coefficients to strategies 2017-07-28 07:39:15 +03:00
Arttu Ylä-Outinen fdb3480b54 Enable strategies for SAO reconstruction
Re-enables strategies for SAO reconstruction. They were disabled in
commit ec9ff42.
2017-07-11 10:35:18 +03:00
Arttu Ylä-Outinen 333dba3884 Add static to SAO strategies 2017-07-11 10:02:01 +03:00
Arttu Ylä-Outinen 563bc26e71 Fix out-of-bounds read in AVX2 SAO
AVX2 version of SAO loaded offsets with a 256 bit read even though there
are only five 32 bit integers.
2017-07-06 13:04:52 +03:00
Arttu Ylä-Outinen 2c66e0bbd2 Fix warnings about invalid reads in AVX2 ipol
AVX2 filter functions read pixels in chunks of 8 or 16 bytes. At the end
of the block, the read goes out of the bounds of the pixels array. The
extra pixels do not affect the result.

Fixes valgrind complaining about the invalid reads by allocating 5 extra
pixels in kvz_get_extended_block_avx2
2017-06-22 09:37:55 +03:00
Arttu Ylä-Outinen 95775a1645 Change coefficient storage order
Changes coefficient storage order to a zig-zag order. Reduces
unnecessary copying of coefficients to temporary arrays.
2017-05-12 16:46:57 +03:00
Arttu Ylä-Outinen 51786eda67 Drop redundant fields in encoder_control_t
Some of the fields in encoder_control_t were simply copies of the
corresponding fields in kvz_config. This commit drops the copied fields
in favor of using the fields in encoder_control_t.cfg directly.
2017-02-09 14:05:28 +09:00
Arttu Ylä-Outinen e78a8dfcf5 Copy the kvz_config passed to encoder_open
The kvz_config struct is created by the user but kvazaar keeps a pointer
to it. It is easy to break things by modifying the configuration outside
kvazaar. In addition, kvazaar modifies the struct even though it is has
a const modifier.

This commit changes the field cfg in encoder_control_t to be a copy of
the kvz_config struct instead of a pointer, removing modifications to
the const struct and allowing users to do whatever they want with it
after opening the encoder.
2017-02-09 13:23:54 +09:00
Arttu Ylä-Outinen 640ff94ecd Use separate lambda and QP for each LCU
Adds fields lambda, lambda_sqrt and qp to encoder_state_t. Drops field
cur_lambda_cost_sqrt from encoder_state_config_frame_t and renames
cur_lambda_cost to lambda.
2017-01-09 01:24:23 +09:00
Ari Lemmetti 70a52f0e48 10-bit: add missing bit depth adjustment to ssd 2016-11-17 19:28:04 +02:00
Ari Lemmetti 29153ed503 Remove unused variable 2016-10-21 17:28:42 +03:00
Ari Lemmetti 778e46dfd8 Add AVX2 version of SSD 2016-10-21 15:07:53 +03:00
Ari Lemmetti 6f5d7c9e06 Move SSD to strategies 2016-10-21 15:07:23 +03:00
Ari Lemmetti 89b941eab4 Fix typo 2016-10-21 15:07:02 +03:00
Ari Koivula cbfa824d1a Merge branch 'simd' 2016-09-27 20:49:45 +03:00
Ari Koivula 14a7bcba25 Use a faster function for clipped inter SAD
Use the vectorized general SSE41 inter SAD in AVX reg_sad for shapes
for which we don't have AVX versions yet.

Also improves speed of --smp and --amp a lot. Got a 1.25x speedup for:
--preset=ultrafast -q 27 --gop=lp-g4d3r3t1 --me-early-termination=on --rd=1 --pu-depth-inter=1-3 --smp --amp

* Suite speed_tests:
-PASS inter_sad: 0.898M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 2.503M x reg_sad(64x63):x86_asm_avx (1000 ticks, 1.000 sec)
-PASS inter_sad: 115.054M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
+PASS inter_sad: 133.577M x reg_sad(1x1):x86_asm_avx (1000 ticks, 1.000 sec)
2016-09-27 20:48:30 +03:00
Eemeli Kallio f41e428e5f Removed kvz_skip_unnecessary_rdoq and reworked --rdoq-skip to skip 4x4 blocks when it is on. 2016-09-09 10:26:07 +03:00
Ari Koivula 02cd17b427 Add faster AVX inter SAD for 32x32 and 64x64
Add implementations for these functions that process the image line by
line instead of using the 16x16 function to process block by block.

The 32x32 is around 30% faster, and 64x64 is around 15% faster,
on Haswell.

PASS inter_sad: 28.744M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec)
PASS inter_sad: 7.882M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)
to
PASS inter_sad: 37.828M x reg_sad(32x32):x86_asm_avx (1014 ticks, 1.014 sec)
PASS inter_sad: 9.081M x reg_sad(64x64):x86_asm_avx (1014 ticks, 1.014 sec)
2016-09-01 21:36:39 +03:00
Ari Lemmetti 28c4174d0e Fix incorrect shuffle parameters
_MM_SHUFFLE uses reverse order
2016-08-23 19:40:46 +03:00
Ari Lemmetti ce77bfa15b Replace KVZ_PERMUTE with _MM_SHUFFLE
The same exact macro already exists
2016-08-22 19:08:46 +03:00
Eemeli Kallio 99d8b9abeb Changed skip_rdoq name to kvz_skip_unnecessary_rdoq. Changed the order it uses when it goes through CGs and tuned its sum calculation. 2016-08-18 14:02:56 +03:00
Eemeli Kallio 1fb4755f31 Added rdoq-skip to quant-generic.c 2016-08-18 12:17:54 +03:00
Eemeli Kallio d20ac03ca2 Added --rdoq-skip option 2016-08-18 12:17:53 +03:00
Arttu Ylä-Outinen 2a946bd88e Rename encoder_state_t.global to frame
"Frame" is more accurate than "global" since when OWF is used, encoder
states for each frame have their own struct.
2016-08-10 13:22:36 +09:00
Arttu Ylä-Outinen 5fbb0a8c27 Fix includes 2016-08-10 13:05:40 +09:00
Ari Lemmetti 6bcba004ff Comment out to fix unused code error on clang. 2016-07-14 14:12:16 +03:00
Ari Lemmetti c0979ebdcb Implement AVX2 luma sampling 2016-07-14 12:53:02 +03:00
Ari Lemmetti 6244560426 Add avx2 strategy for kvz_filter_frac_blocks_luma. 2016-07-14 12:53:02 +03:00
Ari Lemmetti 9c4e9e049b Load only what is needed. Eliminate latency from hadds. 2016-07-14 12:53:01 +03:00
Ari Lemmetti fccfbd2f28 Add strategy for kvz_filter_frac_blocks_luma 2016-07-14 12:51:02 +03:00
Ari Lemmetti 2b0c8db349 Add quad satd for avx2 2016-07-14 12:50:24 +03:00
Ari Lemmetti 0ff69fd6f8 Add any size multi satd 2016-07-14 12:48:37 +03:00
Arttu Ylä-Outinen bf26661782 Add support for 4x4 blocks to SATD_ANY_SIZE.
Makes functions satd_any_size_generic and satd_any_size_8bit_avx2 work
on blocks whose width and/or height are not multiples of 8.
2016-06-16 18:53:17 +09:00
Ari Lemmetti 3107a93eaf Fix avx2 chroma sampling for amp 2016-05-17 14:09:57 +03:00
Ari Lemmetti efbdc5dade Utilize registers more efficiently for 8x8 and larger blocks 2016-04-21 13:26:38 +03:00
Ari Lemmetti 192cee95b2 Vectorize vertical filtering 2016-04-21 13:26:38 +03:00
Ari Lemmetti 0be35f72b8 Filter 4 pixels simultaneously in x direction 2016-04-21 13:26:38 +03:00
Ari Lemmetti 10484bda9f Make strategies out of fractional pixel sample functions 2016-04-21 13:26:38 +03:00
Ari Lemmetti 8247faf8e0 Remove 64-bit only instruction to fix 32-bit compilation. 2016-04-19 18:05:11 +03:00
Ari Lemmetti eb55d6b6b9 Fix writing over boundary. 2016-04-19 16:03:43 +03:00
Ari Lemmetti bcabc6fadd Remove pixel blit from strategies. Use memcpy instead. 2016-04-06 18:44:04 +03:00
Ari Koivula 61fc3e87ba Run include-what-you-use fix_includes.py fix_includes.py
The includes should make more sense now and not just happen to compile
due to headers included from other headers.

Used a modified version of IWYU. Modifications were to attribute int8_t
and so on to stdint.h instead of sys/types.h and immintrin.h instead of
more specific headers.

include-what-you-use 0.7 (git:b70df35)
based on clang version 3.9.0 (trunk 264728)
2016-04-01 17:46:55 +03:00
Ari Koivula 8908d85d66 Change all relative includes to absolute 2016-04-01 17:46:44 +03:00
Ari Koivula 4876879b82 Add IWYU pragmas 2016-03-31 12:33:34 +03:00
Ari Koivula 5b66578f71 Add kvz_ prefix to md5 functions
The non kvz_ symbols were being exported in the static lib, which got caught
by Travis tests.
2016-03-18 13:13:35 +02:00
Ari Koivula 4125218cfa Add --hash=md5
Add md5 through extras/libmd5 taken from HM with BSD license. It's
implemented as a generic strategy using the same interface as checksum,
so we can write a SIMD version if it seems necessary.
2016-03-18 05:23:57 +02:00
Ari Lemmetti e502292ba8 Remove old function 2016-03-16 20:18:55 +02:00
Ari Lemmetti c6cc96f5ec Optimize sao band ddistortion 2016-03-16 20:16:00 +02:00
Ari Lemmetti ab577f476f Optimize sao reconstruct color 2016-03-16 20:15:32 +02:00
Ari Lemmetti 48bfddf4ec Optimize calc sao edge dir 2016-03-16 20:14:50 +02:00
Ari Lemmetti ba69992941 Optimize sao edge ddistortion 2016-03-16 20:14:19 +02:00
Ari Lemmetti 941b6b3e27 Optimize calc eo cat 2016-03-16 20:13:30 +02:00
Ari Lemmetti 04fbb48a09 Add strategy for avx2. Copy generic functions there. 2016-03-16 20:13:15 +02:00
Ari Lemmetti 4e30a215d8 Create generic strategy for sao. 2016-03-16 20:11:15 +02:00
Ari Lemmetti 99e37ec235 Update old pixel type to the current one 2016-01-30 19:33:09 +02:00
Ari Koivula fa1af14637 Fix includes to include global.h first everywhere 2016-01-22 15:07:49 +02:00
Ari Lemmetti 44656aeb19 Remove useless calculation 2016-01-19 16:35:16 +02:00
Ari Lemmetti a2fc9920e6 Merge branch 'alternative-satd' 2016-01-13 15:00:43 +02:00
Ari Lemmetti 1ed34f2df8 Add some planar pred optimization for blocks larger than 8x8 2016-01-13 14:50:17 +02:00
Ari Lemmetti 0df88697ff Copy generic function to AVX2 strategy 2016-01-12 23:51:18 +02:00
Ari Lemmetti 62799a9fc3 Create generic strategy of planar prediction 2016-01-12 23:50:47 +02:00
Ari Lemmetti 3cb1cebfe5 Add missing inlines 2016-01-12 23:03:31 +02:00
Ari Lemmetti 6a0b13b8b6 Remove unused functions 2016-01-12 22:55:37 +02:00
Ari Lemmetti 61155f0edd Add 128-bit version of the functions as well 2016-01-12 22:52:00 +02:00
Ari Lemmetti a6afb8a8f4 Small refactoring 2016-01-12 22:29:33 +02:00
Ari Lemmetti a756f6133a Manually unroll vertical Hadamard transform 2016-01-12 21:45:02 +02:00
Ari Lemmetti 66350aa20e Experiment with alternative implementation of FWHT 2016-01-11 16:25:56 +02:00
Ari Koivula 947bae24f9 Update Doxygen documentation
Add module information to all header files.

Update all header file documentations to briefly say what they are, and
to use the javadoc format so the brief actually gets included into the
doxygen documentation.

Remove \file from implementation files, in order to not repeat the info
from the header files.

Add files under strategies and tools to Doxygen and update the Doxygen
settings to be just plain better.

Make README be the main page of Doxygen documentation.
2015-12-17 14:05:50 +02:00
Arttu Ylä-Outinen 864c77f6eb Use kvz_satd_any_size in inter search.
Changes search_frac and kvz_search_cu_iter to use kvz_satd_any_size for
computing the SATDs instead of getting the SATD function with
kvz_pixels_get_satd_func.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 056fa09ba5 Add arbitrary-sized SATD functions.
Adds strategy satd_any_size for generic and AVX2. The satd_any_size
functions are implemented with macro SATD_ANY_SIZE defined in
strategies-picture.h.
2015-12-15 11:21:45 +02:00
Arttu Ylä-Outinen 728a6abecc Extract macro SATD_NxN.
Combines definitions of macros SATD_NXN and SATD_NXN_AVX2 to macro
SATD_NxN and moves it to strategies-picture.h.
2015-12-15 11:21:44 +02:00
Arttu Ylä-Outinen 4402e251ae Fix kvz_get_extended_block functions.
The buffers allocated in functions kvz_get_extended_block_avx2 and
kvz_get_extended_block_generic were too small when the width of the
block was less than its height. Fixed to allocate correctly sized
buffers.
2015-12-15 11:21:43 +02:00
Ari Lemmetti b78460b02c Optimize another loop 2015-12-11 11:21:43 +02:00
Ari Lemmetti ee8c2d0218 Add 4x4 dual SATD for AVX2 2015-12-03 17:13:11 +02:00
Ari Lemmetti 00736fa708 Generate larger than 8x8 dual satd functions with macro 2015-12-03 17:13:11 +02:00
Ari Lemmetti bd3e1922cd Add AVX2 8x8 dual hadamard transform 2015-12-03 17:13:11 +02:00
Ari Lemmetti d575b94357 Implement generic functions for dual sad / satd 2015-12-03 17:13:11 +02:00
Ari Lemmetti 183ee53f47 Add alternative version of rough intra search.
Calculate two costs simultaneously to exploit larger SIMD registers.
Implementation for dual functions missing currently.
2015-12-03 17:12:38 +02:00
Arttu Ylä-Outinen 940ada4c0d Mark AVX2 intra filter functions as static.
Marks functions filter_4x4_avx2, filter_16x16_avx2 and filter_NxN_avx2
static as they are not used outside strategies/avx2/intra-avx2.
2015-11-09 12:48:20 +02:00
Ari Lemmetti fbd0596114 Merge branch 'avx2-pixels-blit' 2015-11-04 11:06:10 +02:00
Ari Lemmetti 57ea7d223b Pass SIMD registers to functions as pointers to fix 32-bit compilation in visual studio 2015-11-04 10:51:26 +02:00
Ari Lemmetti a3855652e9 Add AVX2 version with separate handling of basic blocks and strideless copy. 2015-11-04 10:07:25 +02:00
Ari Lemmetti 0816fbea2c Create generic strategy of blit function 2015-11-04 10:07:25 +02:00
Marko Viitanen 821d5c478b Added missing parameter to kvz_strategy_register_picture_generic() 2015-11-02 08:55:54 +02:00
Ari Lemmetti d71f1b5bd0 Disable incompatible optimizations for 32-bit version 2015-10-24 15:32:27 +03:00
Ari Lemmetti df995d85e8 Utilize AVX2 for dequantization. 2015-10-23 20:17:08 +03:00
Ari Lemmetti cf347e33c4 Move dequant to strategies. Copy generic to AVX2 as well. 2015-10-23 19:53:50 +03:00
Ari Lemmetti 47082738aa ...and the same tricks for quantized reconstruction 2015-10-23 19:44:38 +03:00
Ari Lemmetti 7961ba80d8 Add functions for bigger block sizes to calculate more residual simultaneously and reduce memory accesses 2015-10-23 19:11:56 +03:00
Ari Lemmetti 15edd5060d Load and store multiple elements simultaneously. Use 128-bit wide zero
test. *wip*
2015-10-23 17:03:16 +03:00
Ari Lemmetti b37cca87c8 Copy generic to avx2 2015-10-23 17:03:15 +03:00
Ari Lemmetti cad2ea9d6e Move quantize_residual to quant strategies. 2015-10-23 17:03:15 +03:00
Ari Lemmetti 0c63041ba7 Add filtering functions for different block sizes. Simplify logic a bit to reduce branching. Sorry for the large commit! 2015-10-23 16:54:15 +03:00
Ari Lemmetti 5af7a42ebe Enable AVX2 strategy. Add first version of optimizations. 2015-10-08 12:36:20 +03:00
Ari Lemmetti f4fe3dca5e Add AVX2 strategy. Copy generic implementation there. 2015-10-08 12:36:15 +03:00
Ari Lemmetti 54e8b346a3 Add intra strategy. Move angular prediction there. 2015-10-08 12:36:05 +03:00
Ari Lemmetti 38106afa50 Add AVX2 version of quantization. 2015-10-02 16:18:52 +03:00
Ari Lemmetti ef0ad292ef Add quantization strategy. 2015-10-02 16:17:02 +03:00
Ari Lemmetti 989cee1b04 Add 4x4 function as well 2015-10-01 22:14:56 +03:00
Ari Lemmetti 8b57b2bb1a Refactor SATD to inline most of the function. Replace full horizontal add with shuffle and regular packed add. 2015-10-01 21:29:25 +03:00
Ari Lemmetti 55da2a9958 Add intrinsic version of SATD for 8x8 and larger blocks 2015-10-01 19:42:22 +03:00
Ari Lemmetti d68fc4c41e Add header for common utilities to use with strategies. 2015-10-01 19:40:35 +03:00
Ari Koivula 9a23ae3d92 Resolve remaining Visual Studio warnings.
- Ignore most of them and fix the ones that can't be ignored.
2015-08-31 15:02:25 +03:00
Arttu Ylä-Outinen 3a10e9e3e0 Prefix all non-static symbols with "kvz_". 2015-08-26 13:02:28 +03:00
Arttu Ylä-Outinen bfe2b31cee Make generic satd functions static. 2015-08-26 12:10:27 +03:00
Ari Lemmetti 923f4a74d5 Fix filtering over limits 2015-08-17 17:39:56 +03:00
Ari Lemmetti 82cf4e8ff4 Output error messages to stderr 2015-08-17 15:01:46 +03:00
Ari Lemmetti 3da71b62bf Add checks if malloc fails 2015-08-17 15:01:46 +03:00
Ari Lemmetti 4718fe7fda Change variable names to match used convention 2015-08-17 15:01:46 +03:00
Ari Lemmetti 6a5eaf08de Rename extend_borders to get_extended_block. Add kvz_ prefix to type definition. 2015-08-17 15:01:46 +03:00
Ari Lemmetti d82582c37c Changes to extend border function.
Now outputs a pointer to a block with guaranteed padding for filtering.
Only generate extra pixels if samples are needed out of bounds.
Use memcpy otherwise.
2015-08-17 15:01:46 +03:00
Ari Lemmetti 5d96dbc6c0 Make strategy selection use bit depth given via parameter instead of excluding registration with defines 2015-08-12 13:33:38 +03:00
Ari Lemmetti 4122f36089 Prevent the registration of strategies that are incompatible when KVZ_BIT_DEPTH != 8
Remove unnecessary or misleading mentions of "8bit"
2015-08-12 11:29:53 +03:00
Ari Lemmetti 348d7780fc Remove third shift and offset from 14-bit sampling functions (change missing from rebase) 2015-08-11 15:06:16 +03:00
Marko Viitanen 8409317bd9 Fixed rebasing errors for 10bit branch 2015-08-11 14:56:45 +03:00
Marko Viitanen 6453a511d7 Scale SAD/SATD costs to match bit depth
Conflicts:
	src/image.c
2015-08-11 08:18:14 +03:00
Marko Viitanen 0304b6c412 Fixed luma interpolation filter when 10bit coding and some other minor fixes 2015-08-11 08:17:48 +03:00
Marko Viitanen 450b5e64ca Fixed overflow on generic ipol filters when 10bit encoding
Conflicts:
	src/strategies/generic/ipol-generic.c
2015-08-11 08:17:48 +03:00
Marko Viitanen 414ebe6101 Fixed checksum on bitdepth > 8 cases
Conflicts:
	src/nal.c
	src/nal.h
	src/strategies/generic/nal-generic.c
	src/strategies/strategies-nal.c
	src/strategies/strategies-nal.h
2015-08-11 08:14:35 +03:00
Marko Viitanen 57ab46f110 Small fixes all around to enable 10bit encoding
Conflicts:
	src/encmain.c
	src/encoder.c
	src/encoderstate.c
	src/global.h
2015-08-11 07:59:20 +03:00
Ari Lemmetti 5887c96991 Add and use 14bit reconstruction for fractional motion vectors with bipred 2015-08-10 18:45:29 +03:00
Ari Lemmetti 8b4a6c92da Add 14bit precision sample functions. 2015-08-10 18:02:06 +03:00
Ari Lemmetti b30f17d4b8 Add fractional pixel sampling for chroma 2015-08-10 17:55:37 +03:00
Ari Lemmetti 01f40ec104 Add fractional pixel sampling for luma 2015-08-10 17:51:48 +03:00
Ari Koivula 0c3c93d456 Optimize intra SAD intrinsics.
- Added 64x64 version for completeness.
- With the exception of 16x16, these were all slightly slower than the ASM
  versions, as measured by "kvazaar_test -s speed -t intra_sad", but now they
  are on par or slightly faster.
- None of these actually use any AVX2 intrinsics, and probably never will,
  unless someone adds an interface for doing more than one block at a time,
  in which case the non-destructive versions might come in handy.
2015-08-06 19:35:00 +03:00
Arttu Ylä-Outinen f7f17a060c Rename pixel_t to kvz_pixel. 2015-07-02 16:58:28 +03:00
Arttu Ylä-Outinen fab07d80da Rename macro BIT_DEPTH to KVZ_BIT_DEPTH. 2015-07-02 16:55:47 +03:00
Marko Viitanen 8ed5d06ebe Fixed compiler warnings caused by the bipred branch merge 2015-04-23 15:12:48 +03:00
Ari Lemmetti b9ec4b0a54 AVX2 acceleration for new luma filtering. 2015-03-11 15:33:38 +02:00
Ari Lemmetti 39eceec38d Rewrite of luma fractional pixel filtering. Utilizes intermediate values instead of calculating everything again. 2015-03-06 17:58:22 +02:00
Ari Koivula ded6fd9ee8 Renamed typedef pixel to pixel_t. 2015-03-04 16:35:53 +02:00
Ari Koivula f6147b410a Rename struct encoder_control to encoder_control_t.
Conflicts:
	src/encoder_state-geometry.h
	src/encoderstate.h
2015-03-04 14:01:14 +02:00
Ari Koivula d7383ccb25 Change license to LGPL.
- Everyone who has contributed code to the project has been asked to license
  their contributions under LPGL and they have agreed.

- COPYING file changed to say LGPLv2.1 instead of GPLv2.

- GPL changed to LGPL in the header of every single file that a header and
  header added to the few that were missing one.

- Also.. Happy new year!
2015-02-25 15:19:05 +02:00
Ari Lemmetti 7430622038 Copy ipol-generic strategy as a base for avx2 strategy 2015-02-05 13:28:07 +02:00
Ari Lemmetti 8495870df8 Using BIT_DEPTH macro because it is constant 2015-02-05 13:19:54 +02:00
Ari Lemmetti c82adae0c4 Use four tap functions in octpel chroma interpolation 2015-02-04 18:23:57 +02:00
Ari Lemmetti 2f11caeb73 Added generic four tap functions. Use them in halfpel chroma interpolation. 2015-02-04 17:50:12 +02:00
Ari Lemmetti 041d970ece Apply fast clipping also to chroma filtering. 2015-01-29 16:19:04 +02:00
Ari Lemmetti c21351cc12 Added fast clipping function for clamping values to bit depth. 2015-01-21 17:53:06 +02:00
Ari Lemmetti f037ed580c Improved data layout 2015-01-15 16:31:18 +02:00
Ari Lemmetti 465f718eeb Move value clipping away from separate loop 2015-01-15 16:14:00 +02:00
Ari Lemmetti 9d12ce21d5 Cleaned luma interpolation, added functions for 8-tap filtering. 2015-01-15 16:13:12 +02:00
Ari Lemmetti 0e56d13b5d Use smaller bit depth for fractional pixel interpolation 2015-01-15 15:00:09 +02:00
Ari Lemmetti cc061b4c3d Added ipol strategy for interpolation filters.
Added initial files for AVX2 and generic strategies.
2015-01-15 14:59:37 +02:00
Ari Koivula fcb6fa6d4b Fix compilation error on PowerPC.
- Need abs from stdlib.
2014-10-21 18:14:32 +03:00