Y
Hacker News
new
|
ask
|
show
|
jobs
by
smallnamespace
615 days ago
> using max(0, exp(x)-1) instead of exp(x)
Won't this cause the gradient to vanish on the left half, causing problems with training?
1 comments
espadrine
615 days ago
That is a concern that is shared with ReLU. But since the weights are shared across the context/minibatch, perhaps that would not be an issue, similar to ReLU.
link