| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by visarga 819 days ago
	> The type of computation that happens for that attention step that you refer to is parallel Flash attention, which is widely used, is no longer parallel. The attention matrix is solved batch by batch.