|
|
|
|
|
by cowwoc2020
67 days ago
|
|
Makes me wish that shortly before the server-side expiration, we could save the cache on the client-side, indefinitely. But my understanding is that we're talking about ~60GB of data per session, so it sounds unrealistic to do... |
|
I think the primary reason they cannot do this is that they change the memory and communication layouts in their serving stack rather aggressively. And naturally keeping the KV cache portable across all such layouts is a very difficult task. So you'd have to version the cache down to a specific deployment, and invalidate it the moment anything even small changes. So giving the user a handle to the cache sort of prevents you from making large changes to memory layout. Which is I suppose not that enticing. Also, client side KV caches are only meaningful in today's 1M contexts. Few y back it wasn't necessary, since just recomputing would be better for everybody.
To be clear, I don't mean they send it along with every request. Rather, they do their current TTL cache, and then when I'm at the end of a session, I request it in one shot and then close the session. And it doesn't have to come to the literal client, they can egress it to a storage service that we pay for, whatever. But ya the compat problem makes it all a non starter.