Hacker News new | ask | show | jobs
by superdimwit 2198 days ago
In the same way that the ReLU derivative is not defined at x=0. Most of the time, in practice, this all doesn't really matter and you can still get gradient descent to work in a useful way.