Hacker News new | ask | show | jobs
by apophis-ren 502 days ago
Flash attention is an implementation trick; you can implement MHA/GQA, for example, with flash attention.