Hacker News new | ask | show | jobs
by _0ffh 1 hour ago
Lookahead Sparse Attention should be playing a big role as well, as it dramatically slashes memory consumption.