Hacker News new | ask | show | jobs
by throw_away_777 3499 days ago
There is a Kaggle competition right now that uses mean absolute error, and this makes the problem substantially harder. For a practical discussion of techniques used to solve machine learning problems that use mae see the forums in: https://www.kaggle.com/c/allstate-claims-severity/forums

As touched upon in the article, the objective not being differentiable is a big deal for modern machine learning methods.

5 comments

Mean absolute error is differentiable almost everywhere. Having objectives that are not differentiable, but are differentiable almost everywhere is very common - in a deep net, if you have rectified linear activations (very common) or L1 regularisation (not unheard of), you have an objective that is not differentiable everywhere ... but the methods still work.
No it isn't.

Differentiability is important if you want to have an closed-form formula and derive it in front of undergraduates.

This is the difference between practice and theory. In theory differential objectives don't matter, in practice for medium to large datasets they make machine learning a lot faster. Speed is critical, as you need to be able to iterate quickly. The solution most commonly used on Kaggle is to transform the target feature and then minimize mean squared error, but there is some systematic uncertainty introduced by this.
You can just use subgradient descent. Nonconvex loss would pose a bigger problem.
> As touched upon in the article, the objective not being differentiable is a big deal for modern machine learning methods.

I'm not sure the absolute value is a big problem here. You still get a convex optimization problem. In neural networks a lot of people use ReLU or step activations functions, which are no more differentiable than the absolute value.

What exactly would go wrong if you assume that the derivative is zero at x = 0?

And aren't exact zeroes an error scenario for most machine learning models anyway?