|
|
|
|
|
by samwho
177 days ago
|
|
With KV caching as it’s described there it has to be a prefix match. OpenAI state in their docs they don’t cache anything below 1024 tokens long, and I’m sure I read somewhere that they only cache in 1024 token blocks (so 1024, 2048, 3072, etc) but I can’t find it now. There’s been some research into how to cache chunks in the middle, but I don’t think any of the providers are doing it yet because it needs the prompt to be structured in a very specific way. |
|
> Caching is available for prompts containing 1024 tokens or more.
No mention of caching being in blocks of 1024 tokens thereafter.