| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hcles 5461 days ago

I don't know of a book reference, but if you sit down for half an hour and play though the algebra of stddev you can figure it how to combine multiple standard deviations into a single stddev. It's fairly simple and quite satisfying. The sum of squares can be computed by multiplying the mean with the number of elements. So, by storing the number of elements, mean of those elements and the population stddev of those elements, you can take two or more sets of numbers and compute their combined standard deviation as a simple formula based on their stddev, mean and nelems.

There's a wikipedia entry in combining standard deviations (Standard_deviation#Combining_standard_deviations) but it's dense if you don't have a math background (I don't have one). The crux of it is, to compute the stddev of a set you need to compute the average, then sum of squares of the delta [ie sum([elem[i] - avg] for 1 .. i)]. You don't have the individual elements any more so you can't compute the sum of squares of mean_deltas, but using the stored stddevs, means and averages you can recompute that information out when computing the new stddev.

Well, it's a lot of easier to explain with a whiteboard. You're basically subtracting out the information you don't have based on the stddev/mean-aka-avg/nelems data you do have, you're subtracting out infinite series and it all works out perfectly.

numsum1 = SUM(nelems[i] * (stddev[i]^2 + mean[i]^2))

numsum2 = SUM( nelems[i] * mean[i] )^2 / SUM(nelems[i])

combined_stddev = SQRT((numsum1 - numsum2) / SUM(nelems[i]))

1 comments

ghotli 5457 days ago

I find this endlessly interesting. Thanks for the overview.

link