| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by visarga 1292 days ago
	> matrix multiplication to Relus (just a max(x,0)) Transformers famously employ the Softmax activation inside the attention matrix. Very rare to see Softmax anywhere other than the final layer.