Pauli Oikkonen
|
3a1f2eb752
|
Prefer SSE4.1 implementation of SAD over AVX2
It seems that the 128-bit wide version consistently outperforms the
256-bit one
|
2019-01-10 13:48:55 +02:00 |
|
Pauli Oikkonen
|
9b24d81c6a
|
Use SSE instead of AVX for small widths
Highly dubious if this will help performance at all
|
2019-01-07 20:12:13 +02:00 |
|
Pauli Oikkonen
|
887d7700a8
|
Modify AVX2 SAD to mask data by byte granularity in AVX registers
Avoids using any SAD calculations narrower than 256 bits, and
simplifies the code. Also improves execution speed
|
2019-01-07 18:53:15 +02:00 |
|
Pauli Oikkonen
|
7585f79a71
|
AVX2-ize SAD calculation
Performance is no better than SSE though
|
2019-01-07 16:26:24 +02:00 |
|
Pauli Oikkonen
|
ab3dc58df6
|
Copy SAD SSE4.1 impl to AVX2
|
2019-01-03 18:31:57 +02:00 |
|
Reima Hyvönen
|
1fcc5c6a8d
|
Merge branch 'bipred_recon'
|
2018-12-11 09:59:35 +02:00 |
|
Reima Hyvönen
|
e4a10880f3
|
Added case 12 to bipred_recon no mov
|
2018-12-11 09:52:17 +02:00 |
|
Reima Hyvönen
|
f8696b54a4
|
Updated bipred_recon_avx2 in avx2/picture-avx2.c. Now it detects blocks that can be not equal to 8 (ie. width = 12)
|
2018-11-20 17:09:19 +02:00 |
|
Reima Hyvönen
|
710ba288db
|
Chroma has some problems
|
2018-11-15 16:42:48 +02:00 |
|
Ari Lemmetti
|
5c774c4105
|
Rewrite most of FME and interpolation filters
Changes had to break a lot of stuff and were just squashed into this horrible code dump
|
2018-11-08 20:21:16 +02:00 |
|
Reima Hyvönen
|
7406c33a42
|
Some more cleaning
|
2018-10-26 12:25:18 +03:00 |
|
Reima Hyvönen
|
4c71546b2e
|
Cleaned some coding
|
2018-10-26 12:19:44 +03:00 |
|
Reima Hyvönen
|
4fe3909e48
|
Switched luma to use 32bits size ints intstead of 16bit size
|
2018-10-24 18:24:46 +03:00 |
|
Reima Hyvönen
|
381e786e10
|
Trying to find the bug in luma
|
2018-10-11 18:08:41 +03:00 |
|
Reima Hyvönen
|
2f5f81bac3
|
removed the non-optimated bipred function
|
2018-10-09 11:19:23 +03:00 |
|
Reima Hyvönen
|
212a8e68fa
|
Modified to avoid memory overflow, still some bug inside luma
|
2018-10-02 20:23:32 +03:00 |
|
Reima Hyvönen
|
896034b7cf
|
Some renamed functions back
|
2018-08-28 15:31:10 +03:00 |
|
Reima Hyvönen
|
7de5c74434
|
Updated bipred_recon to work faster
|
2018-08-28 15:12:31 +03:00 |
|
Reima Hyvönen
|
2ca99a44e8
|
Updated shuffle operation to be in right order
|
2018-08-27 18:16:38 +03:00 |
|
Reima Hyvönen
|
508b218a12
|
some modifications made to prevent reading too much
|
2018-08-14 10:50:39 +03:00 |
|
Reima Hyvönen
|
1d935ee888
|
some useless stuff removed
|
2018-08-13 16:47:11 +03:00 |
|
Reima Hyvönen
|
ce3ac4c05e
|
some modifications to no_mov
|
2018-08-13 16:41:02 +03:00 |
|
Reima Hyvönen
|
15a613ae94
|
test if no_mov breaks testing
|
2018-08-13 16:02:56 +03:00 |
|
Reima Hyvönen
|
97a2049e58
|
removed pointer declaration out from switch
|
2018-08-10 16:42:26 +03:00 |
|
Reima Hyvönen
|
aa94bcedbc
|
Stream is now pointer
|
2018-08-10 16:38:49 +03:00 |
|
Reima Hyvönen
|
fa5b227ece
|
256 to 32 doesn't work, made them by hand
|
2018-08-10 16:01:20 +03:00 |
|
Reima Hyvönen
|
408dedbcc8
|
removed _mm256_extract_epi8 and replaced with _mm_stream
|
2018-08-10 15:53:26 +03:00 |
|
Reima Hyvönen
|
31c35091c6
|
_mm256_cvtsi256_si32 removed
|
2018-08-10 10:06:40 +03:00 |
|
Reima Hyvönen
|
99dc43074f
|
_mm256_cvtsi256_si32 breaks system, too much bits. back to extract
|
2018-08-10 09:59:33 +03:00 |
|
Reima Hyvönen
|
4f1f80b2cb
|
Transformed convert from 256 to cast 256 -> 128 and then convert from 128
|
2018-08-09 15:35:54 +03:00 |
|
Reima Hyvönen
|
4957555eb3
|
Removed leftover from 939
|
2018-08-09 15:25:03 +03:00 |
|
Reima Hyvönen
|
28b165c971
|
Clearified some sections, added _MM_SHUFFLE macro
|
2018-08-09 15:23:01 +03:00 |
|
Reima Hyvönen
|
dd04df8667
|
testing if error in both avx2 functions
|
2018-08-03 11:49:00 +03:00 |
|
Reima Hyvönen
|
ed50d71fde
|
Switched some variables to different location, altered inter_recon_bipred_avx2 function
|
2018-08-02 16:08:59 +03:00 |
|
Reima Hyvönen
|
f5739a0028
|
Renaming and removing useless prints
|
2018-08-02 14:47:17 +03:00 |
|
Reima Hyvönen
|
bc09f59bb6
|
Edited some definitions
|
2018-08-02 11:54:53 +03:00 |
|
Reima Hyvönen
|
a4bf77f208
|
Tested some extract functions
|
2018-07-12 09:29:32 +03:00 |
|
Reima Hyvönen
|
c05033a893
|
Even more useless vectors removed
|
2018-07-11 15:09:14 +03:00 |
|
Reima Hyvönen
|
884cb77238
|
Removed some not used vectors
|
2018-07-11 15:06:11 +03:00 |
|
Reima Hyvönen
|
792689a5ff
|
Removed for-loops, added extract instead
|
2018-07-11 14:56:41 +03:00 |
|
Reima Hyvönen
|
f9c7f6ee66
|
Added some break-operations for avx2 optimation
|
2018-07-11 14:15:38 +03:00 |
|
Reima Hyvönen
|
cc064da143
|
some more optimation for bipred
|
2018-07-11 11:27:54 +03:00 |
|
Reima Hyvönen
|
a22cf03ddb
|
Updated to have no movement function to avx2 strategies
|
2018-07-10 16:07:15 +03:00 |
|
Reima Hyvönen
|
ea83ae45f0
|
Toimiva ratkaisu
|
2018-07-03 11:18:51 +03:00 |
|
Reima Hyvönen
|
17babfffa4
|
25.6 working optimation, ~50% faster than original
|
2018-06-25 17:06:16 +03:00 |
|
Ari Lemmetti
|
70a52f0e48
|
10-bit: add missing bit depth adjustment to ssd
|
2016-11-17 19:28:04 +02:00 |
|
Ari Lemmetti
|
29153ed503
|
Remove unused variable
|
2016-10-21 17:28:42 +03:00 |
|
Ari Lemmetti
|
778e46dfd8
|
Add AVX2 version of SSD
|
2016-10-21 15:07:53 +03:00 |
|
Ari Lemmetti
|
89b941eab4
|
Fix typo
|
2016-10-21 15:07:02 +03:00 |
|
Ari Lemmetti
|
28c4174d0e
|
Fix incorrect shuffle parameters
_MM_SHUFFLE uses reverse order
|
2016-08-23 19:40:46 +03:00 |
|