Hacker News new | ask | show | jobs
by shawntan 816 days ago
You're both kinda right. The type of computation that happens for that attention step that you refer to is parallel. I would say the thing that is "constant" is the computation graph depth (the number of sequential computations) which is actually important in computing certain functions.

https://blog.wtf.sg/posts/2023-02-03-the-new-xor-problem/

1 comments

> The type of computation that happens for that attention step that you refer to is parallel

Flash attention, which is widely used, is no longer parallel. The attention matrix is solved batch by batch.