Hacker News new | ask | show | jobs
by big-chungus4 27 days ago
It optimized the Extend Attention operator in triton. All models were optimizing the same operator