| HN Mirror

Well, it’s one flaw. I would argue that the bigger flaw, which you alluded to, is that the cost of computing the cache yourself maxes out in the single digit dollars even very large frontier models, and that’s a one-time cost. Even if you imagine all the logistics are free and all the transfers are instant, what are we even talking about here from an economic perspective?

KV caching is a super interesting engineering space, especially when you’re talking about local models where compute and memory bandwidth are highly constrained and you’re trying to trim fractions of a second everywhere you can by flipping between different ICL prefixes. But selling caches for specific documents just makes no sense at all.