| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zhihaojia 1172 days ago
	SpecInfer is a system that accelerates generative LLM serving with speculative inference and token tree verification. The key idea is to use an LLM as a token tree verifier instead of an incremental decoder. We show that this reduces LLM inference latency by 2.8x.