Hacker News new | ask | show | jobs
by gpsx 3495 days ago
For minimizing the square of the errors I think the good reason is because, assuming your data has gaussian probability distribution, minimizing the square error corresponds to maximizing the likelihood of the measurement, as you and others have said.

Why do we assume gaussian errors? There is seldom a gaussian distribution in the real world usually because the probability for large error values doesn't not decay that fast. We use it because the math is easy and we can actually solve the problem assuming that.

2 comments

That's a summary of the article.
Yes, sort of. But I think he says a lot of unnecessary things not getting at the root of the issue.

I left out some detail I should have said, like what is so special about a gaussian that makes the math easy. So I will say it.

A measurement can infer a probability distribution for what the measured quantity is. A second measurement, on its own, also infers some probability distribution for what the measured quantity is. It we consider both measurements together, we get yet another probability distribution for what the measured quantity is. The magic is that if we had a gaussian distribution for the measurements, then the distribution for the combined measurements is also a gaussian. This is not true in general. As long as we have gaussian distributions we can do all the operations we want and the probability distributions are gaussian and can be fully described by a center point and a width. (Forgive me for the liberties I am taking here.) The basic alternative to exactly solving the problem is to actually try to carry around the probability distribution functions, which is not practical even with very powerful computers.

I'm sorry, but what do you mean by "decay"?

You're talking about fat tails?