|
|
|
|
|
by sasja
820 days ago
|
|
If you work out the loss function next token prediction, next 2 token prediction or next n token prediction, you will find they are identical. So it's equally correct to say the model is trained to find the most probable unlimited continuation. Saying "it only predicts the next token" is not untrue but easily leads to wrong conclusions. |
|
Indeed, it's akin to saying that "only quantum fields exist" and then concluding that therefore people do not exist.