|
|
|
|
|
by isoprophlex
321 days ago
|
|
but... do you get any validation during the forward pass? the small model could just as well have generated "is Berlin." or whatever. do these models somehow give you a likelihood for the next token when you're prefilling, that you can compare against? if so why not just... use that always? or is this a scenario where computation is expensive but validation is cheap? EDIT: thanks, people, for educating me! very insightful :) |
|