|
|
|
|
|
by btown
4 days ago
|
|
> a few GB per user at scale While this might seem to be true for casual users, I recall that one of the reasons for Anthropic's recent changes for only retaining KV cache for an hour or so, was that many users just have one massive ongoing session that they continue on with multiple unrelated queries (as one would in a single-thread "group chat"). And this is hard to distinguish from someone who wants that context for their seemingly-unrelated query to apply tone etc. So in practice, there are many casual users who are typing their Google-esque searches against a 100k+ token context window - and it's at that point where things balloon into 300GB+ KV caches to maintain. I wouldn't be surprised if we see new UX's around subsidized plans starting to encourage resetting the context window more often. |
|