Hacker News new | ask | show | jobs
by dannyw 6 days ago
Yeah, I'm really not sure what the point of this paper is. Every non-toy environment does prefix caching.
1 comments

Yes, but presumably the authors are suggesting broader application than just caching a system prompt.

The paper's approach should work well if (a) you can calculate KV(A || B) as a function of KV(A) and KV(B) independently, (b) you can identify which documents A1, A2, A3, ... are used commonly enough to be worth caching, and (c) it is cheaper to buy and sell KV(A) on a market than to compute KV(A) when it is needed. Given the size of KV(A) I am not sure that (c) will become true even if people solve the open research problem represented by (a) and accept the state-of-the-art trade-offs known for (b).

> Yes, but presumably the authors are suggesting broader application than just caching a system prompt

The authors of the OP paper "Can I Buy Your KV Cache?" explicitly disregard anything involving KV not rooted at 0:

>> We deliberately study the simplest, safe form: a document treated as a shared prefix, with continuations appended after it

So no, I really think it's just prefix caching. That's actually far from the weirdest thing about that paper: they go on to "prove" that decoding from cached prefill gets the same result as prefilling and decoding on the same content, which... yes. That is how computation works.

Also, the thing they describe already exists: you pay your provider for their cache implementation as part of your token ingress costs. What is that if not paying for cached KV?