Hacker News new | ask | show | jobs
by samwho 177 days ago
With KV caching as it’s described there it has to be a prefix match. OpenAI state in their docs they don’t cache anything below 1024 tokens long, and I’m sure I read somewhere that they only cache in 1024 token blocks (so 1024, 2048, 3072, etc) but I can’t find it now.

There’s been some research into how to cache chunks in the middle, but I don’t think any of the providers are doing it yet because it needs the prompt to be structured in a very specific way.

1 comments

https://platform.openai.com/docs/guides/prompt-caching#requi...

> Caching is available for prompts containing 1024 tokens or more.

No mention of caching being in blocks of 1024 tokens thereafter.

At launch it was described as being in blocks of 128

https://openai.com/index/api-prompt-caching/