| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mmoskal 453 days ago
	Just to clarify: simple prefix KV cache doesn't require any special model training. It does require the inference framework to support it, but most do by now. You can see dramatic improvements in latency and throughput if there is a large shared prefix of the queries.