| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by isoprophlex 321 days ago

but... do you get any validation during the forward pass? the small model could just as well have generated "is Berlin." or whatever. do these models somehow give you a likelihood for the next token when you're prefilling, that you can compare against? if so why not just... use that always?

or is this a scenario where computation is expensive but validation is cheap?

EDIT: thanks, people, for educating me! very insightful :)

3 comments

sanxiyn 321 days ago

Yes, models give likelihoods you can compare against. No, you can't do that without drafting, because likelihood of token N+2 depends on token N+1. That is, you get P(is, The capital of France) and P(Berlin, The capital of France is), but for the later you need to give "is" as input, you can't do P(Berlin, The Capital of France _).

link

pama 321 days ago

If you want to go down the rabbit hole of the state of the art, I recommend the EAGLE3 paper: https://arxiv.org/abs/2503.01840

link

shikon7 321 days ago

Yes, the forward pass does a next token prediction on all input tokens (so we know exactly how many tokens from the small model matched). The expensive thing is not the computation, but the memory bandwidth, as each pass needs to load the model from memory.

If the small model predicts some tokens correctly, you save some passes, at the expense of doing some extra computations when the tokens were not correct.

In any case, each forward pass will give at least one new token.

link