|
|
|
|
|
by lostmsu
129 days ago
|
|
> The prompt cache caches KV Cache states Yes. The cache that caches KV cache states is called the KV cache. "Prompt cache" is just index from string prefixes into KV cache. It's tiny and has no computational impact. The parent was correct to question you. The cost of using it comes from the blend of the fact that you need more compute to calculate later tokens and the fact that you have to keep KV cache entries between requests of the same user somewhere while the system processes requests of other users. |
|
I think that the OpenAI docs are pretty useful for the API level understanding of how it can work (https://developers.openai.com/api/docs/guides/prompt-caching...). The vLLM docs (https://docs.vllm.ai/en/stable/design/prefix_caching/) and SGLang radix hashing (https://lmsys.org/blog/2024-01-17-sglang/) are useful for insights into how to implement it locally for one computer ode.