Hacker News new | ask | show | jobs
Jagged Flash Attention Optimization (shaped.ai)
24 points by tullie 461 days ago
2 comments

Flash attention natively supports packing multiple variable length sequences into a single call, what is the advantage of jagged flash attention?
If only there was a link to a page somewhere that could answer this question for you.