Hacker News new | ask | show | jobs
by antoniuschan99 81 days ago
It could turn a 1M context system to a 4M context system. TurboQuant-style KV-cache compression makes longer context windows cheaper to serve. Not exactly sure how much increase in context size though.