Y
Hacker News
new
|
ask
|
show
|
jobs
by
Hawkenfall
2404 days ago
A more in-depth paper about this found the Swish activation often outperformed other functions:
https://arxiv.org/abs/1710.05941
3 comments
osipov
2403 days ago
Most of the recent research is moving to GELU (Gaussian Error Linear Units) activation functions:
https://arxiv.org/pdf/1606.08415.pdf
link
excessive
2403 days ago
That's interesting. I didn't read the paper closely, but skipping to the pictures, it looks like ReLU, but smoothed out so the derivative is continuous. Intuitively, that seems useful.
link
rickdeveloper
2404 days ago
I wasn’t aware of that one. Definitely interesting, thanks for sharing!
link