| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rfoo 483 days ago
	... and batching does not help, you batch more requests and get more kvcache to load, still memory-access bound. MLA made it possible to cache a smaller form of k/v, mitigating (but not completely solve, on shorter context & smaller batches it's still memory-access bound) the problem.