|
|
|
|
|
by jetbalsa
49 days ago
|
|
The cache is stored on Antropics servers, since its a save state of the LLM's weights at the time of processing. its several gigs in size. Every SINGLE TIME you send a message and its a cache miss you have to reprocess the entire message again eating up tons of tokens in the process |
|
So then it comes to what you're talking about, which is processing the entire text chain which is a different kind of cache, and generating the equivelent tokens are what's being costed.
But once you realize the efficiency of the product in extended sessions is cached in the immediate GPU hardware, then it's obvious that the oversold product can't just idle the GPU when sessions idle.