|
|
|
|
|
by regularfry
606 days ago
|
|
Speculative decoding is using a small model to quickly generate a sequence that every so often you pass through a larger model to check and correct. It can be much faster than just using the larger model, with tolerably close accuracy. |
|
No, speculative decoding has exactly the same accuracy as the target model. It is mathematically identical to greedy decoding.