Accelerating LLM Serving with Speculative Inference and Token Tree Verification

SpecInfer is a system that accelerates generative LLM serving with speculative inference and token tree verification. The key idea is to use an LLM as a token tree verifier instead of an incremental decoder. We show that this reduces LLM inference latency by 2.8x.