|
|
|
|
|
by furyofantares
321 days ago
|
|
Not an expert, but here's how I understand it. You know how input tokens are cheaper than output tokens? It's related to that. Say the model so far has "The capital of France". The small model generates "is Paris.", which let's say is 5 tokens. You feed the large model "The capital of France is Paris." to validate all 5 of those tokens in a single forward pass. |
|
or is this a scenario where computation is expensive but validation is cheap?
EDIT: thanks, people, for educating me! very insightful :)