| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Rochus 1347 days ago
	That's interesting. Is it still that the Rectified Linear Unit (ReLU) is the prevailing activation function in deep neural networks, because of the the vanishing gradients with activation functions like tanh? If so the conclusions from the paper would apply to a very wide range of deep neural networks.