Hacker News new | ask | show | jobs
by ex3ndr 700 days ago
I am wondering why flash attention is like 5x slower with variable masking than without it? Lack of good masking support almost zeros out the optimizations
1 comments

Where are you seeing these benchmarks?