Y
Hacker News
new
|
ask
|
show
|
jobs
by
cyanf
482 days ago
They’re using the FS for caching the KV caches of past requests. It’s why they’re able to charge so little on prompt cache hit.
1 comments
jpgvm
482 days ago
Ahh I missed that. Yes prefix caching and RAG are 2 cases were you will want something like this during inference time.
link