Hacker News new | ask | show | jobs
by hellohello2 35 days ago
"Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind."

Modelling text describing the world is not modelling (some aspect) of the world?

Modelling the probability that a reader likes or dislike a piece of text is not modelling (some aspect) of a reader's state of mind?

2 comments

>Modelling text describing the world is not modelling (some aspect) of the world?

The text describes the world to humans. This is the crucial thing that you miss. It is very subjective.

Imagine that you learn the grammar of a foreign language without learning the meaning of the words. You might be able to make grammatically valid sentences. But you will still will not understand a single thing that something written in that language describes. But that will be perfectly clear to someone who actually understand the meaning of the words.

When you train LLMs on large volumes of text that describe logically consistent facts in a million different ways, the "logic" sort of becomes part of the grammer that the model learns. That is logic becomes a higher kind of "grammer" or a enormous set of grammatical rules that it captures. But that does not mean the model can do actual logic.

> Imagine that you learn the grammar of a foreign language without learning the meaning of the words. You might be able to make grammatically valid sentences. But you will still will not understand a single thing that something written in that language describes. But that will be perfectly clear to someone who actually understand the meaning of the words.

so... back to chinese room arguments?

just because amazon worker inside is just moving folders around following rules, doesn't by default mean the room as a whole can't be corresponding to "something that doesn't understand"

denying emergence as a phenomenon isn't useful when "there are plenty of higher abstraction levels in multiple fields that still capture 99% of events and are easier to model and react to" is the counterpoint

> When you train LLMs on large volumes of text that describe logically consistent facts in a million different ways, the "logic" sort of becomes part of the grammer that the model learns. That is logic becomes a higher kind of "grammer" or a enormous set of grammatical rules that it captures. But that does not mean the model can do actual logic.

This is the kind of stuff people were saying in 2023. But it’s 2026 now and LLMs aren’t just trained by reading lots of text anymore. That’s “pretraining”, and it’s still the first stage, but LLMs also have a huge amount of RLVR training where they actually do solve huge numbers of mathematical and logic puzzles and update their weights in response. They don’t just learn mathematics from reading about it now. They learn it by doing it. That is why they can now solve hard problems and probe theorems.

> that does not mean the model can do actual logic.

But they do, all the time. (Please tell me you’ve at least put a frontier LLM through its paces in the last 6 months?) If you think they can’t do logic and reasoning, can you provide examples of specific math or logic problems that you think a frontier LLM can’t do?

>If you think they can’t do logic and reasoning, can you provide examples of specific math or logic problems that you think a frontier LLM can’t do?

When a thing can "solve" a complex math problem without having the ability to count, then it is clear that this things is not "reasoning" and doing "logic".

You didn’t answer my question. You just restated your claims.

Specific examples? Specific tasks?

Thanks for your explanation, I find it much more intuitive than the paper's.

In your opinion, does a Calculus solver model certain aspects of the world?

No? There's no model involved. It's all just probabilistic. LLMs understand what you're thinking as well as a mood ring.
It isn't possible to have "just probabilistic" (maybe a philosophical exception could be made for a uniform random distribution or whatever provides the little dose of randomness required to get nondeterministic results). Probabilities are always in context of a model. LLMs model language but language itself is a model of something else. My money would have been on language modelling nonsense, but that is quite clearly not the case. Turns out it models the world and so do LLMs.
The literal definition of a model is "an informative representation of an object, person, or system". I think you mean something else though, what are you trying to express exactly?
The model is the thing which is learned in order to make the probabilistic prediction with low entropy.
Well this is probably the same kind of semantic trap she's fighting with. Yes, you're right it's a model. The distinction is that they models of _language_ and not thoughts or feelings.
When I read your reply, I’m also modeling language. Tokens are just the discretization of the model’s eyes and ears. My brain does a huge amount of work to represent what’s happening in the world based on discrete information received from the outside world, just like language models do.
Sure but you've also probably formed a model of who I am and what I'm thinking and formulated a response that isn't just grammatical and relevant but designed to provoke an outcome.
We're discussing whether they are models or not, not whether they have goals and agency. A language model does form a model of who you are and what you're thinking, because language is causally connected to those aspects of the generating distribution and modeling those aspects reduces cross-entropy.

RL provides the goals and agency. Pretraining provides the model.

Nothing about an LLM is “just”. In what precise sense do you mean it is probabilistic?
There's a reason stochastic was used in the original phrase instead of "probabilistic."

While most inference executions are intentionally non-deterministic, even a purely deterministic one would still be stochastic in that the model itself was built in a process such that the statistical frequency, sequencing, etc of the training text and followup processes all heavily influence the result.

Because of that, the output is the sort of thing that is not expected to generate 100% perfect output 100% of the time, but to have a good probability of being like-in-kind-to-the-training-data (and useful/relevant as a result).

(As compared to a non-stochastic model, like arithmetic on integers, where 2+2 is always gonna be 4 and you don't have a chance of coming up with some novel pair of inputs to addition that will cause your arithmetic to miss the mark.)

Agreed. My point was to question the use of “just“ to obscure an incredibly complicated process, which has been shown repeatedly to rely on generalizations that are indistinguishable from world models.

Now, it is true that the world they’re modeling is the world of tokens. But insofar as those tokens, be they text or images or videos, are themselves modeling the real world, LLMs do have a model of the real world.