Hacker News new | ask | show | jobs
by osipov 2404 days ago
Most of the recent research is moving to GELU (Gaussian Error Linear Units) activation functions: https://arxiv.org/pdf/1606.08415.pdf