|
|
|
|
|
by joha4270
120 days ago
|
|
The guts of a LLM isn't something I'm well versed in, but > to get the first N tokens sorted, only when the big model and small model diverge do you infer on the big model suggests there is something I'm unaware of. If you compare the small and big model, don't you have to wait for the big model anyway and then what's the point? I assume I'm missing some detail here, but what? |
|
More info:
* https://research.google/blog/looking-back-at-speculative-dec...
* https://pytorch.org/blog/hitchhikers-guide-speculative-decod...