| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by woadwarrior01 421 days ago
	That should not be the case. Speculative decoding is trading off compute for memory bandwidth. The model's output is guaranteed to be the same, with or without it. Perhaps there's a bug in the implementation that you're using.