| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Balinares 63 days ago
	Isn't that exactly how draft models speed up inference, though? Validating a batch of tokens is significantly faster than generating them.