|
|
|
|
|
by bgbntty2
160 days ago
|
|
I haven't dealt with statistics for a while, but what I don't get is why squares specifically? Why not power of 1, or 3, or 4, or anything else? I've seen squares come up a lot in statistics. One explanation that I didn't really like is that it's easier to work with because you don't have to use abs() since everything is positive. OK, but why not another even power like 4? Different powers should give you different results. Which seems like a big deal because statistics is used to explain important things and to guide our life wrt those important things. What makes squares the best? I can't recall other times I've seen squares used, as my memories of my statistics training is quite blurry now, but they seem to pop up here and there in statistics relatively often, it seems. |
|
If your model is different (y = Ax + b + e where the error e is not normal) then it could be that a different penalty function is more appropriate. In the real world, this is actually very often the case, because the error can be long-tailed. The power of 1 is sometimes used. Also common is the Huber loss function, which coincides with e^2 (residual squared) for small values of e but is linear for larger values. This has the effect of putting less weight on outliers: it is "robust".
In principle, if you knew the distribution of the noise/error, you could calculate the correct penalty function to give the maximum likelihood estimate. More on this (with explicit formulas) in Boyd and Vandenberghe's "Convex Optimization" (freely available on their website), pp. 352-353.
Edit: I remembered another reason. Least squares fits are also popular because they are what is required for ANOVA, a very old and still-popular methodology for breaking down variance into components (this is what people refer to when they say things like "75% of the variance is due to <predictor>"). ANOVA is fundamentally based on the pythagorean theorem, which lives in Euclidean geometry and requires squares. So as I understand it ANOVA demands that you do a least-squares fit, even if it's not really appropriate for the situation.