| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Animats 361 days ago
	That LLMs are a black box and that LLMs lack an underlying model are both true, but orthogonal. It's possible to have a black box system which has an underlying model. That's true of many statistical prediction methods. Early attempts at machine learning were a white box with no underlying model. This is true of most curve-fitting. The AI version was where you're trying to divide a high-dimensional space with a cutting plane to create a classifier. You can tell where the separating plane is, but not why. The lack of a world model is a very real limitation in some problem spaces, starting with arithmetic. But this argument is unconvincing.

2 comments

seanhunter 361 days ago

“LLMs lack an underlying model” is very obviously incorrect. LLMs have an underlying model of semantics as tokens embedded into a high-dimensional vector space.

The question is not whether or not they have any model at all, the question is whether the model they indisputably have (which is a model of language in terms of linear algebra) maps onto a model of the external universe (a “world model”) that emerges during training.

This is pretty much an unfalsifiable question as far as I can see. There has been research that aims to show this one way or another and it doesn’t settle the question of what a “world model” even means if you permit a “world model” to mean anything other than “thinks like we do”.

For example, LLMs have been shown to produce code that can make graphics somewhat in the style of famous modern artists (eg Kandinsky and Mondrian) but fail at object-stacking problems (“take a book, four wine glasses, a tennis ball, a laptop and a bottle and stack them in a stable arrangement”). Depending on the objects you choose the LLM either succeeds or fails (generally in a baffling way). So what does this mean? Clearly the model doesn’t “know” the shape of various 3-D objects (unless the problem is in their training set which it sometimes seems to be) but on the other hand seems to have shown some ability to pastiche certain visual styles. How is any of this conclusive? A baby doesn’t understand the 3-D world either. A toddler will try and fail to stack things in various ways. Are they showing the presence or lack of a world model? How do you tell?

link

danaris 361 days ago

I agree that it's probably unfalsifiable in the sense of proving it definitively based on something like static analysis of the model itself.

But that doesn't mean that we can't, in theory, give the LLM a battery of tests that it should perform well (though not perfectly) on if it has a world model, and poorly (though not fail totally) on if it doesn't.

It's inherently a probabilistic system, so testing it in a probabilistic manner seems perfectly apt. Again: no, this will not produce a definitive result, due to that probabilistic nature—but it can produce an indicative one, and running the same test on multiple related LLMs, or similar tests on the same LLM, should help to smooth out noise in the results.

(...of course, this only works if the tests are designed well, and I don't have enough specific understanding of LLMs to know how one would go about doing that in a rigorous manner!)

link

bloaf 361 days ago

I don't think its nearly as cut-and-dry as that. Even if you tried to make tests to differentiate world-model from non-world-model, all you'd end up concluding is:

If the AI has a world model, its world-model doesn't have features that allow it to do what I tested for.

link

danaris 360 days ago

In theory, if you have some people who know what they're doing, they could design enough different kinds of world-model tests that they could significantly reduce the likelihood of the LLM having a world model.

I think I would probably word the distinction I would draw as "it is technically unfalsifiable, but it is not untestable."

link

comp_throw7 361 days ago

> LLMs lack an underlying model

Obviously false for any useful sense by which you might operationalize "world model". But agree re: being a black box and having a world model being orthogonal.

link