| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by smallnamespace 615 days ago
	> using max(0, exp(x)-1) instead of exp(x) Won't this cause the gradient to vanish on the left half, causing problems with training?

1 comments

espadrine 615 days ago

That is a concern that is shared with ReLU. But since the weights are shared across the context/minibatch, perhaps that would not be an issue, similar to ReLU.

link