|
|
|
|
|
by xanderlewis
786 days ago
|
|
It’s not ‘inaccurate’. The mark of true mastery is an ability to make terse statements that convey a huge amount without involving excessive formality or discussion of by-the-by technical details. If ever you’ve spoken to world-renowned experts in pure mathematics or other highly technical and pendantic fields, you’ll find they’ll say all sorts of ‘inaccurate’ things in conversation (or even in written documents). It doesn’t make them worthless; far from it. If you want to have a war of petty pedantry, let’s go: the derivative of ReLU can’t be discontinuous at zero, as you say, because continuity (or indeed discontinuity) of a function at x requires the function to have a value at x (which is the negation of what your first statement correctly claims). |
|
You might think that it doesn't matter because ReLU is, e.g., non-differentiable "only at one point".
Gradient based methods (what you find in pytorch) generally rely on the idea that gradients should taper to 0 in the proximity of a local optimum. This is not the case for non-differentiable functions, and in fact gradients can be made to be arbitrarily large even very close to the optimum.
As you may imagine, it is not hard to construct examples where simple gradient methods that do not properly take these facts into account fail to converge. These examples are not exotic.