| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by areddyyt 691 days ago
	The non-linear layers, particularly the softmax(QK^T), will be crucial to getting ultra-low latency and high throughput. We're considering some custom silicon just for that portion of every transformer block