| HN Mirror

> Yes, but presumably the authors are suggesting broader application than just caching a system prompt

The authors of the OP paper "Can I Buy Your KV Cache?" explicitly disregard anything involving KV not rooted at 0:

>> We deliberately study the simplest, safe form: a document treated as a shared prefix, with continuations appended after it

So no, I really think it's just prefix caching. That's actually far from the weirdest thing about that paper: they go on to "prove" that decoding from cached prefill gets the same result as prefilling and decoding on the same content, which... yes. That is how computation works.

Also, the thing they describe already exists: you pay your provider for their cache implementation as part of your token ingress costs. What is that if not paying for cached KV?