| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by make3 143 days ago
	There's a million algorithms to make LLM inference more efficient as a tradeoff for performance, like using a smaller model, using quantized models, using speculative decoding with a more permissive rejection threshold, etc etc