Y
Hacker News
new
|
ask
|
show
|
jobs
by
antoniuschan99
81 days ago
It could turn a 1M context system to a 4M context system. TurboQuant-style KV-cache compression makes longer context windows cheaper to serve. Not exactly sure how much increase in context size though.