| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by boroboro4 455 days ago
	They, in fact, mention inference kv cache as use case in readme. The most advanced kv caching uses hierarchy of gpu ram/regular ram/ssd. Seems like they were able to use their storage abstraction for last tier.