Build Kvazaar as usual with make, then edit extract_rdcosts.py so that the
parameters suit your usage (the directories, num of threads and Kvazaar
params) and then run extract_rdcosts.py. It will run a lot of Kvazaar
instances in parallel to encode a lot of videos and sift off all the coeff
groups they measure RD cost for. The coeff groups will be written into the
relevant data file in the following format (although through GZIP):

Size (B)  | Description
----------+------------
4         | size:   Coeff group size, in int16's
4         | ccc:    Coeff group's coding cost
size * 2  | coeffs: Coeff group data

You can roll your own filter_rdcosts.c program to analyze the data the way
you want, and run it like:

$ gzip -d < /path/to/compressed_datafile.gz | ./filter_rdcosts | less

Maybe one day, there'll be a multithreaded script like extract_rdcosts.py to
automate and parallelize processing of a massive heap of data files.

EDIT:
It's now possible to do OLS regression by streaming the source data twice
from source and using Octave to invert the temporary result matrix, and
that's what run_filter.py does in parallel. To do this on data you've
gathered by extract_rdcosts.py:

$ gcc filter_rdcosts.c -o frcosts_matrix
$ gcc ols_2ndpart.c -o ols_2ndpart
$ ./run_filter.py

Although you should probably adjust the run_filter.py params before actually
running it