Hacker News new | ask | show | jobs
by sasja 820 days ago
If you work out the loss function next token prediction, next 2 token prediction or next n token prediction, you will find they are identical. So it's equally correct to say the model is trained to find the most probable unlimited continuation. Saying "it only predicts the next token" is not untrue but easily leads to wrong conclusions.
1 comments

> Saying "it only predicts the next token" is not untrue but easily leads to wrong conclusions.

Indeed, it's akin to saying that "only quantum fields exist" and then concluding that therefore people do not exist.