| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aixpert 134 days ago
	they do model the world. Watch Noble price winner Hinton or let's admit that this is more of a religious question then the technical.

2 comments

D-Machine 134 days ago

They model the part of the world that (linguistic models of the world posted on the internet) try to model. But what is posted on the internet is not IRL. So, to be glib: LLMs trained on the internet do not model IRL, they model talking about IRL.

link

imtringued 133 days ago

His point is that human language and the written record is a model of the world, so if you train an LLM you're training a model of a model of the world.

That sounds highly technical if you ask me. People complain if you recompress music or images with lossy codecs, but when an LLM does that suddenly it's religious?

link

hackinthebochs 133 days ago

A model of a model of X is a model of X, albeit extra lossy.

link

D-Machine 133 days ago

An LLM has an internal linguistic model (i.e. it knows token patterns), and that linguistic model models humans' linguistic models (a stream of tokens) of their actual world models (which involve far, far more than linguistics and tokens, such as logical relations beyond mere semantic relations, sensory representations like imagery and sounds, and, yes, words and concepts).

So LLMs are linguistic (token pattern) models of linguistic models (streams of tokens) describing world models (more than tokens).

It thus does not in fact follow that LLMs model the world (as they are missing everything that is not encoded in non-linguistic semantics).

link

hackinthebochs 132 days ago

At this point, anyone claiming that LLMs are "just" language models aren't arguing in good faith. LLMs are a general purpose computing paradigm. LLMs are circuit builders, the converged parameters define pathways through the architecture that pick out specific programs. Or as Karpathy puts it, LLMs are a differentiable computer[1]. Training LLMs discovers programs that well reproduce the input sequence. Tokens can represent anything, not just words. Roughly the same architecture can generate passable images, music, or even video.

[1] https://x.com/karpathy/status/1582807367988654081

link

tovej 132 days ago

If it's an LLM it's a (large) language model. If you use ideas from LLM architecture in other non-language models, they are not language models.

But it is extremely silly to say that "large language models are language models" is a bad faith argument.

link

hackinthebochs 131 days ago

No, its extremely silly to use the incidental name of a thing as an argument for the limits of its relevance. LLMs were designed to model language, but that does not determine the range of their applicability, or even the class of problems they are most suited for. It turns out that LLMs are a general computing architecture. What they were originally designed for is incidental. Any argument that starts off "but they are language models" is specious out of the gate.

link

tovej 133 days ago

In this case this is not so. The primary model is not a model at all, and the surrogate has bias added to it. It's also missing any way to actually check the internal consistency of statements or otherwise combine information from its corpus, so it fails as a world model.

link