| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by olejorgenb 81 days ago
	> I wonder if there is a more general solution that can make models spend more compute on making important choices, while making generation of the "obvious" tokens cheaper and faster. I think speculative decoding count as a (perhaps crude) way implementing this?