Hacker News new | ask | show | jobs
Running LLMs with 3.3M Context Tokens on a Single GPU (arxiv.org)
14 points by Van_Chopiszt 611 days ago
1 comments

Their demo looks really cool: https://github.com/mit-han-lab/duo-attention