| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by samwho 177 days ago
	With KV caching as it’s described there it has to be a prefix match. OpenAI state in their docs they don’t cache anything below 1024 tokens long, and I’m sure I read somewhere that they only cache in 1024 token blocks (so 1024, 2048, 3072, etc) but I can’t find it now. There’s been some research into how to cache chunks in the middle, but I don’t think any of the providers are doing it yet because it needs the prompt to be structured in a very specific way.

1 comments

> Caching is available for prompts containing 1024 tokens or more.

No mention of caching being in blocks of 1024 tokens thereafter.

At launch it was described as being in blocks of 128