| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 8 days ago
	But text generation is quadratic after the KV cache optimization. If every decode step now has to recompute KV cache including its latest and most expensive tokens (even with a quick, "draft" model) that's even worse.