Hacker News new | ask | show | jobs
by SEGyges 376 days ago
Every LLM provider caches their KV-cache, it's a publicly documented technique (go stuff that KV in redis after each request, basically) and a good engineering team could set it up in a month.
1 comments

Are you saying if I ask a prompt "foo" and then a month later another user asks "foo" then it retrieves a cached value?
No, the key value cache is the context in a way the model can read it.