Hacker News new | ask | show | jobs
by visarga 819 days ago
> The type of computation that happens for that attention step that you refer to is parallel

Flash attention, which is widely used, is no longer parallel. The attention matrix is solved batch by batch.