| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwdbaaway 128 days ago
	If you ask someone knowledgeable at r/LocalLLaMA about an inference configuration that can increase TG by up to 2.5x, in particularly for a sample prompt that reads "Refactor this module to use dependency injection", then the answer is of course speculative decoding. You don't have to work for a frontier lab to know that. You just have to be GPU poor.