| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by srean 3495 days ago

I cannot speak for eanzenberg but I think his comment was less about his personal justification and more about the rationalizations that have been used in the history of stats.

Gauss quite openly admitted that the choice was borne out of convenience. The justification using Normal or Gaussian distribution came later and the Gauss Markov result on conditional distribution came even later.

Even at that time when Gauss proposed the loss, it was noted by many of Gauss' peers and (perhaps by Gauss himself) that other loss functions seem more appropriate if one goes by empirical performance, in particular the L1 distance.

Now that we have the compute power to deal with L1 it has come back with a vengeance and people have been researching its properties with renewed almost earnest. In fact there is a veritable revolution that's going on right now in the ML and stats world around it.

Just as optimizing the squared loss gives you conditional expectation, minimizing the L1 error gives you conditional median. The latter is to be preferred when the distribution has a fat tail, or is corrupted by outliers. This knowledge is no where close to being new. Gauss's peers knew this.

3 comments

Loic 3495 days ago

3 times yes: "The latter is to be preferred when the distribution [...] is corrupted by outliers."

I am working in chemoinformatics, the main methods used by the academics to regress parameters have not changed in the past 40 years even so we went from small carefully assessed data sets (think 200 experimental points) to larger (10000, sometimes millions) with a lot of outliers from data entry errors, experimental errors, etc.

The end results is that when I see models of interest without the raw data, I reregress the parameters using my own datasets because most of the time you can barely trust them (even if coming from well known research centres).

link

mtzet 3495 days ago

> Gauss quite openly admitted that the choice was borne out of convenience.

That's quite interesting. Do you have a reference for that?

From my understanding, the popularity of the least squares method came (at least in part) from Gauss' successful prediction of the position of Ceres. Was this just because people not using least squares were not able to calculate it?

link

riskneural 3495 days ago

It's in the original paper in which he derives the normal distribution. Well worth a read. I last had a copy of it in the fourth basement down in the university library about fifteen years ago - it might be still there.

link

srean 3495 days ago

I have come across his quote about convenience in many places, but don't have a specific reference. Perhaps The Google can help.

The other useful resource is "The Unicorn, The Normal Curve, And Other Improbable Creatures"

link

frankc 3495 days ago

Not disagreeing with your points about L1 but I want to point out that you can also do things to make L2 more robust to outliers (and have better empirical performace), such as winsorizing the data.

link

srean 3495 days ago

Correct. In fact such estimators would typically more efficient than median in many scenarios

link