Commit graph

1000 commits

Author SHA1 Message Date
Laurent Fasnacht 5ee1319c08 Altivec detection 2014-06-03 07:55:39 +02:00
Laurent Fasnacht 58ad3b4d26 Log more performance data, plot also now many threads are running 2014-06-03 07:42:22 +02:00
Laurent Fasnacht 5ed69b063b Strategy selector for array_checksum, basic implementation using precomputed 256*256 block with larger accesses than byte 2014-06-03 07:42:22 +02:00
Ari Koivula a483e8cb0f Move cpuid stuff away from compiler namespace.
Conflicts:
	src/strategyselector.c
2014-05-30 10:08:14 +03:00
Marko Viitanen 6a72f87028 Merge commit '792a5a5dd1946a327f22b2daba05c6645dfa8037' 2014-05-30 08:47:01 +03:00
Marko Viitanen 792a5a5dd1 Small fix for __get_cpuid() 2014-05-30 08:37:03 +03:00
Laurent Fasnacht 642564b6fb Remove unused variable 2014-05-28 15:04:45 +02:00
Laurent Fasnacht 4f86919d75 Get rid of assembly cpuid for x86, compilation works for powerpc 2014-05-28 15:04:00 +02:00
Ari Koivula e585da37e5 Give correct transform depth to RDOQ.
Conflicts:
	src/search.c
2014-05-28 15:47:49 +03:00
Ari Koivula dceb3da9b8 Fix bug in search relating to transform with no non-zero coefficients.
- Because cost was calculated even though there were no coefficients, these
  very good modes were less likely to be selected.

- Added assert to encode_coeff_nxn to avoid these problems in the future.
2014-05-28 15:22:18 +03:00
Ari Koivula ddc02cc09e Avoid regenerating reference pixels for every rdo mode. 2014-05-22 13:18:28 +03:00
Ari Koivula dbe13d0cba Separate sad intra search from rdo search. 2014-05-22 12:47:45 +03:00
Ari Koivula 19ce21e07c Split final cost to luma and chroma functions. 2014-05-22 09:45:00 +03:00
Ari Koivula a6962e2974 Separate intra transform coding to luma and chroma functions. 2014-05-22 09:40:34 +03:00
Laurent Fasnacht 3a30a886fc FREE_POINTER of job->rdepends was at the wrong place (memory leak) 2014-05-22 07:15:18 +02:00
Laurent Fasnacht 3b38777b71 Fix condition depending on uninitialized value in SAO 2014-05-21 16:33:24 +02:00
Laurent Fasnacht 66e730ba94 Fix encoder_state_init, which was making out of bound reads 2014-05-21 14:23:36 +02:00
Laurent Fasnacht 37c20b8ce5 Add dependency between SAO rows 2014-05-21 13:52:56 +02:00
Laurent Fasnacht 90f46dc56f Threadqueue has now a start index to the first queue job. It improves the speed a little 2014-05-21 12:02:55 +02:00
Laurent Fasnacht f4f9093cb5 Parallel SAO 2014-05-21 11:48:29 +02:00
Laurent Fasnacht a3fcb141ed lcu_order_element now has pointer to neighbor LCUs 2014-05-21 11:06:53 +02:00
Ari Koivula de76d0a294 Don't add dependency to the above LCU in wavefront if it's not necessary.
- The top-right LCU already has dependency to the top LCU.
2014-05-20 10:48:19 +03:00
Laurent Fasnacht bdc2d43180 Write bitstream directly after doing the search. This is required since we need the correct entropy status for wpp 2014-05-20 09:29:01 +02:00
Laurent Fasnacht 06532292fc Wavefront are in tile coordinates 2014-05-20 09:28:58 +02:00
Ari Koivula 4751a3744b Fix intra mode search not doing boundary smoothing for DC.
- Move the boundary smoothing to the prediction function to make sure it's not
  forgotten.
2014-05-19 16:23:17 +03:00
Ari Koivula f9a603e4ea Move intra mode search form intra module to search module.
- Make the actual intra prediction function global.

- Move the rdo stuff to rdo module.
2014-05-19 16:12:02 +03:00
Ari Koivula 1da94f2085 Stop deblocking from filtering edges not on 8x8 grid. 2014-05-19 15:58:54 +03:00
Ari Koivula 2224e18a46 Make deblocking work with transform splits.
- It used to work only with the implicit transform split from LCU size.
2014-05-19 15:58:54 +03:00
Ari Koivula 656b0a321b Add chroma mode to lcu_set_intra_mode.
- This is needed for intra split.
2014-05-19 15:58:54 +03:00
Ari Koivula 921f58b249 Add tr_split to lcu_set_intra_mode. 2014-05-19 15:58:54 +03:00
Ari Koivula 846b608125 Add transform split recursion to intra reconstruction. 2014-05-19 15:58:54 +03:00
Ari Koivula 63f6cad5a0 Include global.h in thread modules. 2014-05-19 15:58:16 +03:00
Ari Koivula 551b087b47 Remove bunch of unnecessary code from encode_transform_unit.
- Really, it's useless. Selecting scan order isn't this hard.

- Checked from HM that ctx_idx doesn't have anything to do with contexts.
2014-05-16 17:42:40 +03:00
Ari Koivula f73bef0941 Remove unused include. 2014-05-16 16:09:59 +03:00
Laurent Fasnacht 6fdb821b14 Fix memory leaks 2014-05-16 12:20:40 +02:00
Laurent Fasnacht d4a6aed471 Multi-row jobs 2014-05-16 12:20:40 +02:00
Marko Viitanen 94285fbed7 Fixed compiling on visual studio with _DEBUG defined 2014-05-16 12:22:06 +03:00
Marko Viitanen 86155ef1ba Added windows specific timing macros for thread debugging 2014-05-16 12:16:22 +03:00
Laurent Fasnacht 36945e89ce Stubs to be able to make a portable version of the profiling 2014-05-16 10:15:05 +02:00
Laurent Fasnacht 53b0835316 Improve handling of jobs when not using threads 2014-05-16 08:50:43 +02:00
Laurent Fasnacht 519750d630 Write bitstream of a wavefront in a parallel way 2014-05-16 08:50:42 +02:00
Laurent Fasnacht 7473ac1bfc Able to log time in a simple way 2014-05-16 08:50:42 +02:00
Laurent Fasnacht 86e01284b8 Add -lrt 2014-05-16 08:48:54 +02:00
Laurent Fasnacht 4f73a7fc91 Instrument threads in order to be able to do some visualization 2014-05-16 08:44:32 +02:00
Ari Koivula a7cd31d87b Update the names of some bins to the current spec.
- Helps with debugging.
2014-05-16 05:44:03 +03:00
Ari Koivula ab4041c8fc Change cabac debug statements to show information better.
- Show the number of bits when encoding multiple bins. I would like just the
  bits them selves in string form, but that's too much trouble for this.

- Print then as unsigned and coerce them to unsigned, as they are going
  get coerced to unsigned by the function call anyway.

- Change state to be less verbose.
2014-05-16 05:44:03 +03:00
Ari Koivula c9a8756fbd Fix NxN scan mode for lcu_get_final_cost.
- Scan mode was always selected according to the first PU mode.
2014-05-15 16:20:35 +03:00
Marko Viitanen b08047cce9 Fixed intra chroma mode selection 2014-05-15 09:50:05 +03:00
Ari Koivula f0e990905e Remove chroma mode "36".
- It's an unnecessary chore to handle this special case everywhere (it means
  chroma_mode == intra_mode). Better just to use the actual mode.
2014-05-14 19:56:35 +03:00
Ari Koivula 60a0ba4280 Update VS project files to link win32-pthread.
- I haven't found a good way of including external dependencies to VS projects
  yet. Win32-pthreads is assumed to be found at the same level as kvazaar dir
  and has the files x86/pthreadVC2.lib and x64/pthreadVC2.lib.

- Win32-pthreads also requires the pthreadVC2.dll to be in PATH when running
  the program. Not sure what to do about that yet. We might need an installer
  for windows to handle that.

- Disable openmp as it's no longer used.

- Stop linking Ws2_32.lib as that hasn't been used for ages.
2014-05-14 17:54:34 +03:00
Laurent Fasnacht 8ff9ea0eee Wavefront works with parallelism + deblock (still no SAO) 2014-05-14 14:01:26 +02:00
Laurent Fasnacht 38444a81a6 Threads should be put in queue in wait state if we want to add dependencies later 2014-05-14 14:01:25 +02:00
Laurent Fasnacht e72408249b Add encoder_state pointer to lcu_order_element, new worker_encoder_state_search_lcu function to run the search stuff on one LCU 2014-05-14 14:01:24 +02:00
Laurent Fasnacht eb62696461 Fix problems when image dimensions is not a multiple of LCU 2014-05-14 13:27:14 +02:00
Laurent Fasnacht 1ba1683c05 search buffer has to be allocated tile-wise to avoid problems with wavefronts 2014-05-14 13:27:13 +02:00
Laurent Fasnacht bb86f24000 Take advantage of the new buffers to remove uneeded item assignment 2014-05-14 13:27:13 +02:00
Laurent Fasnacht 6607c9f563 Use new buffers for search 2014-05-14 13:27:12 +02:00
Laurent Fasnacht c257c4b863 Add const for the buffers 2014-05-14 13:27:12 +02:00
Laurent Fasnacht 1680273e80 Store search borders in a buffer for the whole picture 2014-05-14 13:27:11 +02:00
Laurent Fasnacht 0ceb1469a2 Improve decision about when to split into threads 2014-05-14 13:27:11 +02:00
Laurent Fasnacht d4a303e7e6 Free jobs as soon as possible 2014-05-14 13:27:09 +02:00
Laurent Fasnacht 63adb54a3d Add --threads <int> command line parameter 2014-05-14 13:27:09 +02:00
Laurent Fasnacht e772799d5e encoder_state_encode uses now the threadqueue 2014-05-14 13:27:08 +02:00
Laurent Fasnacht baede7f6c4 threadqueue 2014-05-14 13:27:08 +02:00
Laurent Fasnacht 8b7774153f Add SLEEP() define 2014-05-14 13:27:08 +02:00
Laurent Fasnacht aac7fc55b1 Remove filter_deblock function, which is not used and somewhat dangerous, since it doesn't take into account specific stuff about subencoders. 2014-05-14 13:27:07 +02:00
Laurent Fasnacht bc3ca90bdf Fix tiles when SAO or deblock is enabled.
Was broken by previous commit.
2014-05-14 13:27:07 +02:00
Laurent Fasnacht 4815a0604b Entropy coding sync works without parallelism, without SAO and without deblocking 2014-05-14 13:27:06 +02:00
Laurent Fasnacht 2c2a2528f3 Remove openmp stuff 2014-05-14 13:27:06 +02:00
Ari Koivula aee9bf2875 Re-add rdo control to transformskip decision.
- It got left out when rewriting the function.
2014-05-14 12:39:23 +03:00
Ari Koivula 9147b7acbf Split residual quantization to separate luma and chroma function. 2014-05-14 11:19:48 +03:00
Ari Koivula e947bd4c0e Clean up trskip decision code and remove old code.
- You can define structs inside functions! This changes everything!!

- Bitstream changes a little bit compared to old trskip decision. Bdrate
  change is insignificant though.
2014-05-13 22:00:04 +03:00
Ari Koivula a3cdee9ec5 Move new trskip decision to a function. 2014-05-13 21:59:00 +03:00
Ari Koivula 2ff713ccb2 Add new implementation for trskip decision. 2014-05-13 21:57:45 +03:00
Ari Koivula 8b8da6f493 Make luma and chroma use the same quantization function.
- Only thing not working was transform skip.
2014-05-13 21:57:23 +03:00
Ari Koivula f0bfcedba2 Clean up coeff reconstruction code. 2014-05-13 21:56:10 +03:00
Ari Koivula 0c65a9b658 Remove abs_sum from coeff quantization.
- It's meant for checking if there are any coefficients, but we don't use it
  and it's annoying to remember to initialize it and pass it around. The
  benefit should be quite small anyway.
2014-05-13 21:54:34 +03:00
Ari Koivula 75042fc65d Move luma quantization to it's own function. 2014-05-13 21:34:06 +03:00
Ari Koivula ba3aaf3189 Expand chroma functions to parent function.
- This was done so that making the function work with luma would be easier.
2014-05-13 21:30:14 +03:00
Ari Koivula 637aceb495 Add TR_MAX_WIDTH.
- Max transform size is constrained by but independent of LCU size.

- Luma and chroma now have the same stride for transform arrays.
2014-05-13 21:22:40 +03:00
Ari Koivula 1c38209cab Add missing include. 2014-05-13 09:33:05 +03:00
Ari Koivula 13577562e5 Revert change to definition of LCU_WIDTH. 2014-05-13 09:28:01 +03:00
Ari Koivula fb763f7940 Move coefficient generation functions from encoder.c to transform.c.
- These functions probably should have been there to begin with.
2014-05-12 11:37:39 +03:00
Ari Koivula a3478ecd20 Move transform skip decision to it's own function. 2014-05-12 11:18:27 +03:00
Ari Koivula d9b890de6e Remove redundant variables.
- Redefine LCU_WIDTH to be 64. Stuff will break horribly if it's
  anything else anyway.

- Add LCU_WIDTH_C for chroma LCU width. It should be more readable than the
  constant (LCU_WIDTH >> 1).
2014-05-12 10:58:07 +03:00
Ari Koivula 59e0e98523 Separate luma and chroma coefficient generation variables. 2014-05-12 10:38:24 +03:00
Ari Koivula 0ca65e7606 Move chroma coefficient generation to it's own function.
- It's time to chop up this monster that is encode_transform_tree.
2014-05-12 10:24:06 +03:00
Ari Koivula 3c3c9a26c6 Move scan order selection to a function. 2014-05-12 08:47:16 +03:00
Ari Koivula 623d9001a8 Reorder chroma coefficient generation. 2014-05-12 08:47:16 +03:00
Ari Koivula 93141c7d2e Avoid unnecessary copying of predicted pixels when there are no coeffs.
- These are probably from a time when reconstruction happened in this
  function.
2014-05-09 16:39:58 +03:00
Ari Koivula 27ab882c25 Clean up coefficient generation. 2014-05-09 16:33:10 +03:00
Ari Koivula ce945ab4ef Handle coefficient initialization better.
- Coefficients are no longer required to be pre-zeroed. The resulting zeroes
  are copied in even in the case where we already know they are all zeroes.

- Move cbf clearing code to only happen at the leaves of the recursion.
2014-05-09 16:30:28 +03:00
Laurent Fasnacht b274558139 Refactor and fix entry_points functions.
Seems to be OK with HM now
2014-05-09 12:42:37 +02:00
Laurent Fasnacht 43b5f84c0d Fix sao_calc_edge_block_dims
It was computing wrong dimensions, which was causing out-of-bounds reads in sao_reconstruct.
2014-05-09 10:30:34 +02:00
Laurent Fasnacht 3f975e92cd Replace line fixing symptoms by assertions, to reveal the cause 2014-05-09 08:24:03 +02:00
Laurent Fasnacht 4dbf7c7a52 Fix blit dimensions in sao_search_best_mode 2014-05-09 08:24:02 +02:00
Ari Koivula cb5d7e6541 Fix compilation for VS2010. 2014-05-08 17:28:12 +03:00
Laurent Fasnacht 0452806ec4 Entry points 2014-05-08 15:04:56 +02:00
Laurent Fasnacht da588af2ba Partial support for wavefront 2014-05-08 15:04:55 +02:00
Laurent Fasnacht 4de5660254 Fix missing offset in LCU range computation for wavefronts 2014-05-08 15:04:55 +02:00
Laurent Fasnacht dc34a5eac6 LCU borders 2014-05-08 15:04:54 +02:00
Laurent Fasnacht 24f4a8cad1 Wavefront also needs entrypoints 2014-05-08 15:04:53 +02:00
Laurent Fasnacht d05f8b52aa Rewrite of encoder_state_write_bitstream_leaf: handle slice + tiles + wavefronts correctly 2014-05-08 15:04:53 +02:00
Laurent Fasnacht 27f694e3e8 Some initial code to support wpp and slices 2014-05-08 15:04:52 +02:00
Laurent Fasnacht b3d1754cc3 context_copy function 2014-05-08 15:04:51 +02:00
Laurent Fasnacht 163189c3c7 Bitstream for leaves can be computed in parallel 2014-05-08 15:04:51 +02:00
Laurent Fasnacht be9882f5b2 Leaf bitstream write 2014-05-08 15:04:50 +02:00
Laurent Fasnacht ae6a7a9c4b Leaf encoder uses encoder_state->lcu_order 2014-05-08 15:04:49 +02:00
Laurent Fasnacht b740142325 Add is_leaf to encoder_state 2014-05-08 15:04:48 +02:00
Laurent Fasnacht 8451d5b100 Move some init code to encoder_state_new_frame 2014-05-08 15:04:48 +02:00
Laurent Fasnacht 1cb3f14dfe lcu_order_count in (leaves) encoder 2014-05-08 15:04:47 +02:00
Laurent Fasnacht ef6ae3e723 Remove dead code 2014-05-08 15:04:46 +02:00
Ari Koivula 535b42bc9b Fix compilation for VS2010. 2014-05-07 15:26:44 +03:00
Laurent Fasnacht 05eef82896 Remove extra [ from graphviz dump 2014-05-07 13:40:29 +02:00
Laurent Fasnacht 84e5dbee39 Remove quote from graphviz dump 2014-05-07 13:33:02 +02:00
Laurent Fasnacht b48a687d3c Restored parallelism, but it will be done in another way... OpenMP is not very efficient in these kind of dynamic situation 2014-05-07 11:55:56 +02:00
Laurent Fasnacht 0e6f1c99fc Refactor picture to remove hidden dependency between slice and tiles
picture.type -> encoder_state->global->pictype
picture.slicetype -> encoder_state->global->slicetype
picture.slice_sao_luma_flag -> 1 (was constant)
picture.slice_sao_chroma_flag -> 1 (was constant)

This may be changed later. For now it's better to avoid having slice related stuff in picture.
2014-05-07 11:55:48 +02:00
Laurent Fasnacht 39d96e0546 Fix bug with cabac stream pointing to bad data 2014-05-07 11:55:41 +02:00
Laurent Fasnacht e144f817ef Works when not using tiles 2014-05-07 11:55:16 +02:00
Laurent Fasnacht 24c2bd70ca Fix small bugs with compilation 2014-05-07 11:54:35 +02:00
Laurent Fasnacht a03f0cba19 encoder_control_input_init near the other encoder_control_* functions 2014-05-07 11:53:21 +02:00
Laurent Fasnacht 1e2671ac30 Renamed encoder_clear_refs to encoder_state_clear_refs 2014-05-07 11:53:12 +02:00
Laurent Fasnacht 831b221cf8 Parsing seems to work now 2014-05-07 11:53:01 +02:00
Laurent Fasnacht 8b5cb62237 Debug code to generate a graph 2014-05-07 11:52:04 +02:00
Laurent Fasnacht cee6bb0e71 Fix iteration on children 2014-05-07 11:49:14 +02:00
Laurent Fasnacht 699669ee35 fixed typo 2014-05-07 11:48:16 +02:00
Laurent Fasnacht 6c6adf18c7 Refactor encoder_state 2014-05-07 11:47:31 +02:00
Laurent Fasnacht a23edd0339 added parent to encoder_state 2014-05-07 11:42:54 +02:00
Laurent Fasnacht 5ce518a47a lcu_at_tile_start and lcu_at_tile_end helper functions 2014-05-07 11:42:30 +02:00
Laurent Fasnacht c2872bd6b0 Slices and WPP in command line and encoder 2014-05-07 11:42:04 +02:00
Laurent Fasnacht 2d6f199246 reorganized encoder_state structure 2014-05-07 11:41:27 +02:00
Laurent Fasnacht f0b076876f Moved all the stream related stuff into substream_write_bitstream 2014-05-07 11:40:20 +02:00
Laurent Fasnacht f30b9c2a11 Fix a buffer overflow in parse_tiles_specification 2014-05-07 11:39:45 +02:00
Ari Koivula eaf8835bda Add some comments and const qualifiers. 2014-05-06 19:20:38 +03:00
Ari Koivula 3910b7989a Clear old cbf data before recursion in encode_transform_tree.
- Because encode_transform_tree also maintains the CBF data and assumes that
  the CBFs are initially zeroed, calling the function more than once would
  result in incorrect CBF data.
2014-05-06 19:03:29 +03:00
Ari Koivula bdc16d2612 Improve cu_info coded block flag data structure a bit.
- It works just like the old structure except that the flags are checked with
  bitmasks instead of having the flag value be propagated upwards. There isn't
  really any benefit to this because the flags still have to be propagated to
  parent CUs.

- Wrapped them inside a struct to make copying them easier. (Just need to copy
  the struct instead of making individual copies)
2014-05-06 18:28:04 +03:00
Ari Koivula d123b98aea Remove unnecessary tertiary expressions from usages of CABAC_BIN. 2014-05-06 17:39:25 +03:00
Ari Koivula 380401b2eb Have CABAC_BIN accept any >0 as binary 1.
It used to treat odd numbers as false.
2014-05-06 17:39:10 +03:00
Marko Viitanen bf2c2a1330 Small changes to fix compiling on VS
- Added threads.h to VS project
- Included Windows.h in threads.h
2014-05-05 11:18:43 +03:00
Laurent Fasnacht f3d4e6eb09 Move bitstream write to a separate function, and add assertions about the part which should not write to bitstream. 2014-05-05 09:24:57 +02:00
Laurent Fasnacht 0fe080ad0a bitstream_tell 2014-05-05 08:53:06 +02:00
Laurent Fasnacht 7f6f4fe9c1 Reference count for picture 2014-05-05 08:03:24 +02:00
Laurent Fasnacht 323054d5e2 naming: alloc_yuv_t -> yuv_t_alloc dealloc_yuv_t -> yuv_t_free 2014-05-02 11:45:27 +02:00
Laurent Fasnacht 7d6d1d5536 Remove pic->pred_* 2014-05-02 11:38:07 +02:00
Laurent Fasnacht 92e14cc80d rename picture_init to picture alloc and picture_destroy to picture_free 2014-05-02 10:58:28 +02:00
Laurent Fasnacht b76f7377b6 Always initialize tiles data structures (even with only one tile) 2014-05-02 10:00:22 +02:00
Laurent Fasnacht f97e60a80d Doc for encoder state 2014-05-02 10:00:12 +02:00
Laurent Fasnacht 161fe38f5e Remove USE_TILES define 2014-05-01 13:58:13 +02:00
Laurent Fasnacht a84fd6486d Add function subencoder_blit_pixels 2014-05-01 11:16:11 +02:00
Laurent Fasnacht b8b28635ff Iterable structure for sub-encoders (more flexibility) 2014-05-01 11:16:10 +02:00