Hacker News new | ask | show | jobs
Accelerating LLM Serving with Speculative Inference and Token Tree Verification (github.com)
3 points by zhihaojia 1126 days ago
1 comments

SpecInfer is a system that accelerates generative LLM serving with speculative inference and token tree verification. The key idea is to use an LLM as a token tree verifier instead of an incremental decoder. We show that this reduces LLM inference latency by 2.8x.