Hacker News new | ask | show | jobs
by platers 455 days ago
Flash attention natively supports packing multiple variable length sequences into a single call, what is the advantage of jagged flash attention?
1 comments

If only there was a link to a page somewhere that could answer this question for you.