| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throw_away_777 3499 days ago
	There is a Kaggle competition right now that uses mean absolute error, and this makes the problem substantially harder. For a practical discussion of techniques used to solve machine learning problems that use mae see the forums in: https://www.kaggle.com/c/allstate-claims-severity/forums As touched upon in the article, the objective not being differentiable is a big deal for modern machine learning methods.

5 comments

haeffin 3499 days ago

Mean absolute error is differentiable almost everywhere. Having objectives that are not differentiable, but are differentiable almost everywhere is very common - in a deep net, if you have rectified linear activations (very common) or L1 regularisation (not unheard of), you have an objective that is not differentiable everywhere ... but the methods still work.

link

thanatropism 3499 days ago

No it isn't.

Differentiability is important if you want to have an closed-form formula and derive it in front of undergraduates.

link

throw_away_777 3499 days ago

This is the difference between practice and theory. In theory differential objectives don't matter, in practice for medium to large datasets they make machine learning a lot faster. Speed is critical, as you need to be able to iterate quickly. The solution most commonly used on Kaggle is to transform the target feature and then minimize mean squared error, but there is some systematic uncertainty introduced by this.

link

hyperbovine 3499 days ago

You can just use subgradient descent. Nonconvex loss would pose a bigger problem.

link

thomasahle 3499 days ago

> As touched upon in the article, the objective not being differentiable is a big deal for modern machine learning methods.

I'm not sure the absolute value is a big problem here. You still get a convex optimization problem. In neural networks a lot of people use ReLU or step activations functions, which are no more differentiable than the absolute value.

link

nightcracker 3499 days ago

What exactly would go wrong if you assume that the derivative is zero at x = 0?

And aren't exact zeroes an error scenario for most machine learning models anyway?

link