| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Loic 3494 days ago

3 times yes: "The latter is to be preferred when the distribution [...] is corrupted by outliers."

I am working in chemoinformatics, the main methods used by the academics to regress parameters have not changed in the past 40 years even so we went from small carefully assessed data sets (think 200 experimental points) to larger (10000, sometimes millions) with a lot of outliers from data entry errors, experimental errors, etc.

The end results is that when I see models of interest without the raw data, I reregress the parameters using my own datasets because most of the time you can barely trust them (even if coming from well known research centres).