| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bubblethink 607 days ago
	Speculative decoding does not trade off accuracy. You reject the speculated tokens if the original model does not accept them, kind of like branch prediction. All these providers and third parties benchmark each other's solutions, so if there is a drop in accuracy, someone will report it. Their sequence length is 8k.