Hacker News new | ask | show | jobs
by Rochus 1347 days ago
That's interesting. Is it still that the Rectified Linear Unit (ReLU) is the prevailing activation function in deep neural networks, because of the the vanishing gradients with activation functions like tanh? If so the conclusions from the paper would apply to a very wide range of deep neural networks.