Speeding up LLM Inference with parallel decoding

Y	Hacker News new \| ask \| show \| jobs

	Speeding up LLM Inference with parallel decoding (twitter.com)
	1 points by pgspaintbrush 1075 days ago