|
|
|
|
|
by furyofantares
39 days ago
|
|
Here is the line in vLLM's source code that determines if a draft token is accepted: accepted = draft_prob > 0 and target_prob / draft_prob >= uniform_prob
It does have a branch that checks only token id equality, which is used if temperature is 0. |
|
Edit: I haven't gone through all the code, but they might do something like this: https://arxiv.org/abs/2211.17192 where a draft model is used and the output distribution is tweaked on rejection, resulting in the exact same distribution as the main model.