| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cyanf 482 days ago
	They’re using the FS for caching the KV caches of past requests. It’s why they’re able to charge so little on prompt cache hit.

1 comments

Ahh I missed that. Yes prefix caching and RAG are 2 cases were you will want something like this during inference time.