Hacker News new | ask | show | jobs
by highd 3494 days ago
"if the best model is not differentiable, you should still use it."

I'm not sure I would say that - neural nets are "near everywhere differentiable", for example. Without differentiability we're stuck with, for example, discrete GAs for optimization, and you can throw all your intuition out the window (not to mention training/learning efficiency).

1 comments

A few misconceptions I should correct in this comment.

- There is plenty of existing technology for handling non-differentiable function. Functions like the absolute value, 2-norm, and so on have a generalization of the gradient (the subgradient) which can be used in lieu of the gradient.

- That functions are "almost everywhere differentiable" (i.e. the non-differentability lies in a manifold of zero measure) makes these functions behave pretty much like smooth ones. This is often not the case as optima often conspire to lie exactly on these nonsmooth manifolds.

And error measures involving sum of absolute values (i.e., L1 norm) are central to methods like lasso (https://en.wikipedia.org/wiki/Lasso_(statistics)) and their cousins.
Yes, that was what I was saying. Absolute value, 2-norm are fine thanks to subgradient techniques and theory, as well as their differentiability over the majority of the function - but you can imagine tons of non-differentiable models where the subgradient is mostly useless and we generally use convex relaxations or other smoother analogs.

I don't think there was any misconception.