Hacker News new | ask | show | jobs
by SeaGully 478 days ago
The other guy gives a solid explanation so don't use mine as a replacement or to assume the other is wrong.

To me there are two ways to approach the problem I think you are thinking of (sample variance I think).

(1) The sample variance depends on the sample mean which is sum(x_i) / n. Given the first n-1 of n samples, you would then know the final value (x_n = n * sample_mean - sum(x_i)_(n-1)) so at the very least n-1 could be understood as a "degrees of freedom". There are only n-1 degrees of freedom. Other higher sample moments can be roughly understood with the same degrees of freedom argument. This could be wrong though, it was just something I remember from somewhere.

(2) The more mathematically inclined way is that biased_sample_variance = sum((x_i - sum(x_i) / n)^2) / n. The mean of the biased_sample_variance (across many iterations of a set of samples N), is not the population variance, but (n - 1) / n * population_variance (i.e. it is biased). So you multiply the biased_sample_variance by (n / (n - 1)) which gives the unbiased sample_variance equation: sum((x_i - sum(x_i) / n)^2) / (n - 1). The math is rather fun in my opinion, once you get into the swing of things.

I sure do hope I understood your question correctly.