Y
Hacker News
new
|
ask
|
show
|
jobs
Running LLMs with 3.3M Context Tokens on a Single GPU
(
arxiv.org
)
14 points
by
Van_Chopiszt
611 days ago
1 comments
charlie_xxx
611 days ago
Their demo looks really cool:
https://github.com/mit-han-lab/duo-attention
link