|
|
|
|
|
by zhihaojia
1124 days ago
|
|
SpecInfer is a system that accelerates generative LLM serving with speculative inference and token tree verification. The key idea is to use an LLM as a token tree verifier instead of an incremental decoder. We show that this reduces LLM inference latency by 2.8x. |
|