|
|
|
|
|
by ssivark
2198 days ago
|
|
The kinks correspond to a set of measure zero, which you will likely never hit during execution, so one can safely ignore the problem as not physically relevant. One way to think of the problem is that the cost function we’re differentiating is approximate/fake, and whatever it needs to be (at some special neighborhoods) to give us derivatives we consider sensible (in large regions). After all, there’s nothing so special about the ReLU... It would be very very weird/unstable if our algorithms worked for ReLU, but not the link-smoothed version of ReLU. |
|
All optimal points (for, say, optimizing a linear function) will lie on the extremal points of the feasible domain, many of which will be points where the constraint functions are not differentiable. In all cases you can turn nonlinear objective function optimization (say over f) into linear objective function optimization by adding a constraint f(x) ≤ t and moving t to the objective.
Now, I will agree that smooth optimization algorithms will work ok, but try optimizing abs(x) with GD; you'll find that the best possible error you can achieve (other than by sheer luck) will be ~O(L) where L is your stepsize.