| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shawnz 3499 days ago
	I am no math expert, but I have always thought about it like this. The squared error is like weighting the error by the error. This causes one big error to be more significant than many small errors, which is usually what you want. Am I on the right track?

3 comments

robotresearcher 3499 days ago

> This causes one big error to be more significant than many small errors,

That's correct.

> which is usually what you want

Unless you have outliers, in which case it's what you don't want. So you add e.g. a Huber loss function to reach a compromise.

link

dajohnson89 3499 days ago

I just thought it was to give positive and negative error values the same treatment. Moreover I think that it's debatable that one big error is more important than many small errors. That is conceivably a bad strategy, in some cases -- if most points have low error, do you really want to penalize your candidate function for having a very few bad outliers? To me that is no better than giving extra favor to a few points that happen to have low error.

link

tomp 3499 days ago

No, that's exactly why absolute error is better. "Big errors" are called outliers, they're (relatively) rare, often caused by bad data (measurement errors, typos, etc.) and substiantially influence the outcome of your calculation. In other words, squared error is less robust.

But squared error is easier to compute. So, in practice, what you do is you remove outliers (e.g. cap the data at +-3sigma) then use squared error.

link

amelius 3499 days ago

> So, in practice, what you do is you remove outliers (e.g. cap the data at +-3sigma) then use squared error.

But if you are say fitting a function to the data, you can't tell beforehand which data-points are the outliers. So in that case perhaps you need an iterative approach of removing them (?)

link