| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shawntan 816 days ago
	You're both kinda right. The type of computation that happens for that attention step that you refer to is parallel. I would say the thing that is "constant" is the computation graph depth (the number of sequential computations) which is actually important in computing certain functions. https://blog.wtf.sg/posts/2023-02-03-the-new-xor-problem/

1 comments

> The type of computation that happens for that attention step that you refer to is parallel

Flash attention, which is widely used, is no longer parallel. The attention matrix is solved batch by batch.