|
|
|
|
|
by cornel_io
844 days ago
|
|
"Next token predictor" isn't quite the burn that it seems like, because perfect next token prediction would require actual understanding. That's because you can almost always cast any question about understanding into a form where it depends solely on the next token (there are a couple nitpicky exceptions and caveats but not many). GPT 4 is at a high enough level of performance that mere simple statistics aren't really helping it do any better, it really is developing structures especially in the middle layers that perform some amount of high level understanding. I don't think that pure next token prediction will always be the optimal way to train and enhance these behaviors, but it's not fair to say that it's unrelated, if this really was just stochastic parroting then LLMs would have topped out way before the level they're at now. |
|