Hacker News new | ask | show | jobs
by btown 4 days ago
> a few GB per user at scale

While this might seem to be true for casual users, I recall that one of the reasons for Anthropic's recent changes for only retaining KV cache for an hour or so, was that many users just have one massive ongoing session that they continue on with multiple unrelated queries (as one would in a single-thread "group chat"). And this is hard to distinguish from someone who wants that context for their seemingly-unrelated query to apply tone etc.

So in practice, there are many casual users who are typing their Google-esque searches against a 100k+ token context window - and it's at that point where things balloon into 300GB+ KV caches to maintain.

I wouldn't be surprised if we see new UX's around subsidized plans starting to encourage resetting the context window more often.

1 comments

300GB of context for a single session is huge though. Modern local models max out at a whole lot less than that.