Hacker News new | ask | show | jobs
by anonymousDan 4129 days ago
But why take the square and not just the absolute value of the differences? Is the idea to emphasize outliers and hence give higher variance to skewed datasets?
3 comments

The real idea is that you have an implicit model, specifically a normal distribution. The variance is one of the parameters of the normal distribution (the other being the mean).

A normal distribution is a good implicit model to choose - the central limit theorem and similar laws suggest that lots of other distributions will asymptotically approach it. But it's not always the right choice - e.g., it's a disaster when you have power law tails, or low frequency high amplitude noise.

could you give some examples or detail about power law tails?
So the CLT says the sum of random variables with rapidly decaying tails will approach a normal distribution. There are similar results showing that the sum of slowly decaying random variables approaches a stable distribution:

https://en.wikipedia.org/wiki/Stable_distribution

This makes the stable distribution the right answer under some circumstances.

For different test statistics (e.g. max drawdown), you've got similar fat tailed distributions, e.g. GEV:

https://en.wikipedia.org/wiki/Generalized_extreme_value_dist...

As an example of how you might use slowly decaying distributions, consider this example of Cauchy PCA:

http://arxiv.org/pdf/1412.6506v1.pdf

I'm working on an blog post explaining the use of fat tailed distributions for linear regression in a Bayesian context.

Right, I stopped short because I reached the point at which I'd have to Google to double check anything.

Yeah, you've invented the 'mean absolute deviation' (or 'average absolute deviation'), which might be better than variance or standard deviation (the square root of variance) in some circumstances. It's been debated for 100 years: http://www.leeds.ac.uk/educol/documents/00003759.htm

Part of the reason for using variance might be like you said, to give more weight to outliers.

Part of the reason variance and standard deviation might be more popular is because usually the spread of a set of data has a normal distribution. And there are all these formulas and calculations that were invented before computers that are easier to do with variance and standard deviation apparently. Manipulating equations with absolute values is trickier.

There are also some mental shortcuts you can take if you know the standard deviation of a set of data. If a car is rated 8 on average, then about 95% of all of the ratings are within 2 standard deviations of the mean. Thus, if you want to buy a car rated 8 on average and want to be 95% sure that the particular car you buy is at least a 7, check that the standard deviation of the ratings is less than 0.5. Probably not a great example. Imagine instead you are buying oranges that are 8 out of 10 quality-wise on average, and you want to be confident that 95% of the oranges are at least a 7, so that you don't have to throw out too many. See https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rul...

I don't know, here are some other suggested reasons for using variance & standard deviation instead of absolute mean differences: https://stats.stackexchange.com/questions/118/why-square-the... https://www.quora.com/Why-do-we-square-instead-of-using-the-...

Variance is also tied into the central limit theorem:

http://blog.gembaacademy.com/2007/07/16/explaining-the-centr...

One interesting benefit of the square difference is that it's differentiable, while the absolute difference isn't, which can come in handy for some optimization problems.