Y
Hacker News
new
|
ask
|
show
|
jobs
by
petu
51 days ago
Speculative decoding batches multiple completions on all possible outcomes (0/1/2 draft tokens accepted) and sees if big model deviates at any point -- thus verifying each token. So there's no difference in output.