| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by petu 51 days ago
	Speculative decoding batches multiple completions on all possible outcomes (0/1/2 draft tokens accepted) and sees if big model deviates at any point -- thus verifying each token. So there's no difference in output.