Commit graph

118 commits

Author SHA1 Message Date
Ari Lemmetti a3855652e9 Add AVX2 version with separate handling of basic blocks and strideless copy. 2015-11-04 10:07:25 +02:00
Ari Lemmetti 0816fbea2c Create generic strategy of blit function 2015-11-04 10:07:25 +02:00
Marko Viitanen 821d5c478b Added missing parameter to kvz_strategy_register_picture_generic() 2015-11-02 08:55:54 +02:00
Ari Lemmetti d71f1b5bd0 Disable incompatible optimizations for 32-bit version 2015-10-24 15:32:27 +03:00
Ari Lemmetti df995d85e8 Utilize AVX2 for dequantization. 2015-10-23 20:17:08 +03:00
Ari Lemmetti cf347e33c4 Move dequant to strategies. Copy generic to AVX2 as well. 2015-10-23 19:53:50 +03:00
Ari Lemmetti 47082738aa ...and the same tricks for quantized reconstruction 2015-10-23 19:44:38 +03:00
Ari Lemmetti 7961ba80d8 Add functions for bigger block sizes to calculate more residual simultaneously and reduce memory accesses 2015-10-23 19:11:56 +03:00
Ari Lemmetti 15edd5060d Load and store multiple elements simultaneously. Use 128-bit wide zero
test. *wip*
2015-10-23 17:03:16 +03:00
Ari Lemmetti b37cca87c8 Copy generic to avx2 2015-10-23 17:03:15 +03:00
Ari Lemmetti cad2ea9d6e Move quantize_residual to quant strategies. 2015-10-23 17:03:15 +03:00
Ari Lemmetti 0c63041ba7 Add filtering functions for different block sizes. Simplify logic a bit to reduce branching. Sorry for the large commit! 2015-10-23 16:54:15 +03:00
Ari Lemmetti 5af7a42ebe Enable AVX2 strategy. Add first version of optimizations. 2015-10-08 12:36:20 +03:00
Ari Lemmetti f4fe3dca5e Add AVX2 strategy. Copy generic implementation there. 2015-10-08 12:36:15 +03:00
Ari Lemmetti 54e8b346a3 Add intra strategy. Move angular prediction there. 2015-10-08 12:36:05 +03:00
Ari Lemmetti 38106afa50 Add AVX2 version of quantization. 2015-10-02 16:18:52 +03:00
Ari Lemmetti ef0ad292ef Add quantization strategy. 2015-10-02 16:17:02 +03:00
Ari Lemmetti 989cee1b04 Add 4x4 function as well 2015-10-01 22:14:56 +03:00
Ari Lemmetti 8b57b2bb1a Refactor SATD to inline most of the function. Replace full horizontal add with shuffle and regular packed add. 2015-10-01 21:29:25 +03:00
Ari Lemmetti 55da2a9958 Add intrinsic version of SATD for 8x8 and larger blocks 2015-10-01 19:42:22 +03:00
Ari Lemmetti d68fc4c41e Add header for common utilities to use with strategies. 2015-10-01 19:40:35 +03:00
Ari Koivula 9a23ae3d92 Resolve remaining Visual Studio warnings.
- Ignore most of them and fix the ones that can't be ignored.
2015-08-31 15:02:25 +03:00
Arttu Ylä-Outinen 3a10e9e3e0 Prefix all non-static symbols with "kvz_". 2015-08-26 13:02:28 +03:00
Arttu Ylä-Outinen bfe2b31cee Make generic satd functions static. 2015-08-26 12:10:27 +03:00
Ari Lemmetti 923f4a74d5 Fix filtering over limits 2015-08-17 17:39:56 +03:00
Ari Lemmetti 82cf4e8ff4 Output error messages to stderr 2015-08-17 15:01:46 +03:00
Ari Lemmetti 3da71b62bf Add checks if malloc fails 2015-08-17 15:01:46 +03:00
Ari Lemmetti 4718fe7fda Change variable names to match used convention 2015-08-17 15:01:46 +03:00
Ari Lemmetti 6a5eaf08de Rename extend_borders to get_extended_block. Add kvz_ prefix to type definition. 2015-08-17 15:01:46 +03:00
Ari Lemmetti d82582c37c Changes to extend border function.
Now outputs a pointer to a block with guaranteed padding for filtering.
Only generate extra pixels if samples are needed out of bounds.
Use memcpy otherwise.
2015-08-17 15:01:46 +03:00
Ari Lemmetti 5d96dbc6c0 Make strategy selection use bit depth given via parameter instead of excluding registration with defines 2015-08-12 13:33:38 +03:00
Ari Lemmetti 4122f36089 Prevent the registration of strategies that are incompatible when KVZ_BIT_DEPTH != 8
Remove unnecessary or misleading mentions of "8bit"
2015-08-12 11:29:53 +03:00
Ari Lemmetti 348d7780fc Remove third shift and offset from 14-bit sampling functions (change missing from rebase) 2015-08-11 15:06:16 +03:00
Marko Viitanen 8409317bd9 Fixed rebasing errors for 10bit branch 2015-08-11 14:56:45 +03:00
Marko Viitanen 6453a511d7 Scale SAD/SATD costs to match bit depth
Conflicts:
	src/image.c
2015-08-11 08:18:14 +03:00
Marko Viitanen 0304b6c412 Fixed luma interpolation filter when 10bit coding and some other minor fixes 2015-08-11 08:17:48 +03:00
Marko Viitanen 450b5e64ca Fixed overflow on generic ipol filters when 10bit encoding
Conflicts:
	src/strategies/generic/ipol-generic.c
2015-08-11 08:17:48 +03:00
Marko Viitanen 414ebe6101 Fixed checksum on bitdepth > 8 cases
Conflicts:
	src/nal.c
	src/nal.h
	src/strategies/generic/nal-generic.c
	src/strategies/strategies-nal.c
	src/strategies/strategies-nal.h
2015-08-11 08:14:35 +03:00
Marko Viitanen 57ab46f110 Small fixes all around to enable 10bit encoding
Conflicts:
	src/encmain.c
	src/encoder.c
	src/encoderstate.c
	src/global.h
2015-08-11 07:59:20 +03:00
Ari Lemmetti 5887c96991 Add and use 14bit reconstruction for fractional motion vectors with bipred 2015-08-10 18:45:29 +03:00
Ari Lemmetti 8b4a6c92da Add 14bit precision sample functions. 2015-08-10 18:02:06 +03:00
Ari Lemmetti b30f17d4b8 Add fractional pixel sampling for chroma 2015-08-10 17:55:37 +03:00
Ari Lemmetti 01f40ec104 Add fractional pixel sampling for luma 2015-08-10 17:51:48 +03:00
Ari Koivula 0c3c93d456 Optimize intra SAD intrinsics.
- Added 64x64 version for completeness.
- With the exception of 16x16, these were all slightly slower than the ASM
  versions, as measured by "kvazaar_test -s speed -t intra_sad", but now they
  are on par or slightly faster.
- None of these actually use any AVX2 intrinsics, and probably never will,
  unless someone adds an interface for doing more than one block at a time,
  in which case the non-destructive versions might come in handy.
2015-08-06 19:35:00 +03:00
Arttu Ylä-Outinen f7f17a060c Rename pixel_t to kvz_pixel. 2015-07-02 16:58:28 +03:00
Arttu Ylä-Outinen fab07d80da Rename macro BIT_DEPTH to KVZ_BIT_DEPTH. 2015-07-02 16:55:47 +03:00
Marko Viitanen 8ed5d06ebe Fixed compiler warnings caused by the bipred branch merge 2015-04-23 15:12:48 +03:00
Ari Lemmetti b9ec4b0a54 AVX2 acceleration for new luma filtering. 2015-03-11 15:33:38 +02:00
Ari Lemmetti 39eceec38d Rewrite of luma fractional pixel filtering. Utilizes intermediate values instead of calculating everything again. 2015-03-06 17:58:22 +02:00
Ari Koivula ded6fd9ee8 Renamed typedef pixel to pixel_t. 2015-03-04 16:35:53 +02:00