Hacker News new | ask | show | jobs
by jph00 1116 days ago
FlashAttention has memory linear in sequence length. https://github.com/HazyResearch/flash-attention