No. The how is relevant here because it leads to understanding of the resulting behavior.
If you train the LLM on a corpus that shows people saying the sky is red, you get an LLM that is predisposed to say the sky is red. This is true even if it's also trained on all of the science that explains how and why the sky is blue.
If it were to "figure out" or "reason", it would not have such a predisposition to emit "red" after "the sky is" just because that matches the reward during training.
In other words, the token prediction is important because it both explains the successes AND the failures of the LLM. If there were situations in which a bird could fail to fly, then how it tried to fly would also be crucial knowledge.
You can also teach humans science and math and then they can be trained by a cult to not use any of that reasoning when emitting canned responses that they were rewarded by the cult for internalizing during their training. "Fake News!"
You're caught up on the mechanics of token processing (floating point matrix ALU math) and ignoring the context that p(next token) as a function being "computed" is doing so over a trillion parameters. You can poorly train a model, sure, but assuming you don't indoctrinate it too much, properties like cognition emerge - it learns to reason; why? Reasoning is more efficient and compact than memorizing answers.
I completely agree that humans sometimes are not applying reasoning to things.
I'm not trying to argue a model cannot "reason" or have "cognition", whatever those things are. I'm only saying that it's absolutely the case that whatever those things are, they come from its mechanism of predicting one token at a time ad infinitum, and that throwing away a deep understanding in favor of a shallow one is foolish. Just because it might seem to be "reasoning" does not mean it IS doing so, and certainly giving the appears of reasoning does not mean it is NOT a token predictor.
If I knew deeply how the human brain works I would use that understanding instead of saying things like "this person reasons" or "this person thinks".
In summary, I'm not "caught up in" anything - I'm just trying to point out that the original poster here is incorrect in saying that clearly LLMs aren't working through token prediction. They are, and all their behavior is 100% explained by token prediction. That's more than enough for interesting behavior!
Not and expert but how does this explain planning or anything creative? That's just generating things according to the world model with no error correction afterwards.
If you train the LLM on a corpus that shows people saying the sky is red, you get an LLM that is predisposed to say the sky is red. This is true even if it's also trained on all of the science that explains how and why the sky is blue.
If it were to "figure out" or "reason", it would not have such a predisposition to emit "red" after "the sky is" just because that matches the reward during training.
In other words, the token prediction is important because it both explains the successes AND the failures of the LLM. If there were situations in which a bird could fail to fly, then how it tried to fly would also be crucial knowledge.