| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nexttk 377 days ago

I haven't read it all and must admit that I'm not sure I really understood the parts that I did read. Reading the part under the headline "Why We Need the World, and How LLMs Pretend to Understand It" and the focus on 'next-token-prediction' makes me wonder how seriously to take it. It just seems like another "LLM's are not intelligent, they are merely next token predictors". An argument which in my view is completely invalid and based on a misunderstanding.

The fact that they predict next token is just the "interface" i.e. an LLM has the interface "predictNextToken(String prefix)". It doesn't say how it is implemented. One implementation could be a human brain. Another could be a simple lookup table that looks at the last word and then selects the next from that. Or anything in between. The point is that 'next-token-prediction' does not say anything about implementation and so does not reduce the capabilities even though it is often invoked like that. Just because it is only required to emit the next token (or rather, a probability distribution thereof) it is permitted to think far ahead, and indeed has to if it is to make a good prediction of just the next token. As interpretability research (and common sense) shows, LLM's have a fairly good idea what they are going to say in the many, many next tokens ahead in order that it can make a good prediction for the next immediate tokens. That's why you can have nice, coherent, well-structured, long responses from LLM's. And have probably never seen it get stuck in a dead end where it can't generate a meaningful continuation.

If you are to reason about LLM capabilities never think in terms of "stochastic parrot", "it's just a next token predictor" because it contains exactly zero useful information and will just confuse you.

2 comments

lsy 377 days ago

I think people hear "next token prediction" and think someone is saying the prediction is simple or linear, and then argue there is a possibility of "intelligence" because the prediction is complex and has some level of indirection or multiple-token-ahead planning baked into the next token.

But the thrust of the critique of next-token prediction or stochastic output is that there isn't "intelligence" because the output is based purely on syntactic relations between words, not on conceptualizing via a world model built through experience, and then using language as an abstraction to describe the world. To the computer there is nothing outside tokens and their interrelations, but for people language is just a tool with which to describe the world with which we expect "intelligences" to cope. Which is what this article is examining.

link

famouswaffles 377 days ago

>But the thrust of the critique of next-token prediction or stochastic output is that there isn't "intelligence" because the output is based purely on syntactic relations between words, not on conceptualizing via a world model built through experience, and then using language as an abstraction to describe the world. To the computer there is nothing outside tokens and their interrelations, but for people language is just a tool with which to describe the world with which we expect "intelligences" to cope. Which is what this article is examining.

LLMs model concepts internally and this has been demonstrated empirically many times over the years, including recently by anthropic (again). Of course, that won't stop people from repeating it ad nauseum.

link

nemjack 377 days ago

Concepts within modalities are potentially consistent, but the point the author is making is that the same "concept" vector may lead to inconsistent percepts across modalities (e.g. a conflicting image and caption).

link

yahoozoo 376 days ago

Yes, LLMs often generate coherent, structured, multi-paragraph responses. But this coherence emerges as a side effect of learning statistical patterns in data, not because the model possesses a global plan or explicit internal narrative. There is no deliberative process analogous to human thinking or goal formation. There is no mechanism by which it consciously “decides” to think 50 tokens ahead; instead, it learns to mimic sequences that have those properties in the training data.

Planning and long-range coherence emerge from training on text written by humans who think ahead, not from intrinsic model capabilities. This distinction matters when evaluating whether an LLM is actually reasoning or simply simulating the surface structure of reasoning.

link

famouswaffles 376 days ago

>But this coherence emerges as a side effect of learning statistical patterns in data, not because the model possesses a global plan or explicit internal narrative.

That's not true.

https://www.anthropic.com/research/tracing-thoughts-language...

link