Hacker News new | ask | show | jobs
by slaymaker1907 1738 days ago
One thing you could try to use are variance optimal histograms. These are histograms which set bucket boundaries such that the weighted average variance in the buckets is minimized. Unfortunately, this algorithm is quadratic with the number of data points, but you could try approximating the optimum by taking random subsets of the data, computing the histogram, then seeing how well the histogram does on the whole dataset.

http://www.mathcs.emory.edu/~cheung/Courses/584/Syllabus/pap...