|
|
|
|
|
by srean
3495 days ago
|
|
I cannot speak for eanzenberg but I think his comment was less about his personal justification and more about the rationalizations that have been used in the history of stats. Gauss quite openly admitted that the choice was borne out of convenience. The justification using Normal or Gaussian distribution came later and the Gauss Markov result on conditional distribution came even later. Even at that time when Gauss proposed the loss, it was noted by many of Gauss' peers and (perhaps by Gauss himself) that other loss functions seem more appropriate if one goes by empirical performance, in particular the L1 distance. Now that we have the compute power to deal with L1 it has come back with a vengeance and people have been researching its properties with renewed almost earnest. In fact there is a veritable revolution that's going on right now in the ML and stats world around it. Just as optimizing the squared loss gives you conditional expectation, minimizing the L1 error gives you conditional median. The latter is to be preferred when the distribution has a fat tail, or is corrupted by outliers. This knowledge is no where close to being new. Gauss's peers knew this. |
|
I am working in chemoinformatics, the main methods used by the academics to regress parameters have not changed in the past 40 years even so we went from small carefully assessed data sets (think 200 experimental points) to larger (10000, sometimes millions) with a lot of outliers from data entry errors, experimental errors, etc.
The end results is that when I see models of interest without the raw data, I reregress the parameters using my own datasets because most of the time you can barely trust them (even if coming from well known research centres).