Hacker News new | ask | show | jobs
by projektfu 8 days ago
"At 1M tokens, SubQ 1.1 Small requires 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2."

6450% less compute? Is Trump working there?