The main thread has to wait for the worker threads to finish. The
pthread_cond_timedwait call used to accomplish this was given
a relative instead of absolute time, which resulted in the call
returning immediately, because the time had already passed.
This removes the now unnecessary sleeps and fixes the time given to
the pthread_cond_timedwait such that it now waits until a job finishes
or 100ms have passed.
The OWF wpp limit code assumed square blocks, and as such did not work
correctly when height != width. This changes the relevant code to consider
both height and width.
Add md5 through extras/libmd5 taken from HM with BSD license. It's
implemented as a generic strategy using the same interface as checksum,
so we can write a SIMD version if it seems necessary.
The previous reasoning used deblocking and fractional motion estimation
together to arrive at a margin of 4 pixels. This was wrong, and with
either of these off, half pixel chroma interpolation could use pixels
outside the intended region.
Deblocking does not currently affect the margin needed.
I was a bit unclear about exactly what happens and when regarding SAO
and deblocking when we do frame-parallel WPP parallelism, so I checked
and commented the bits that were unclear to me.
The check was done in regard to the wrong dimension, allowing the
access to unfinished parts of the frame when coding multiple frames
at the same time.
Add new parameter --tiles that accept only uniform split. I considered
supporting the syntax of --tiles-width-split for this, but writing
--tiles=u2xu2 is just not as intuitive as --tiles=2x2, and there is
hardly ever any reason to use anything but uniform split. The more
cumbersome --tiles-width-split and --tiles-height-split parameters
are still there to allow finer control.
There was an off by one error in the dependance setting code, which
resulted in dependencies not being set resulting in checksum errors.
For example if ref_neg=1 and owf=1.