| I very disagree but have an upvote for a well-argued comment. >> The question is, does an LLM in the course of modelling asymmetric correlations, develop something analogous to an explanatory model. I think so, in the sense that a good statistical model will intrinsically capture explanatory relations. A statistical model may "capture" explanatory relations, but can it use them? A data scientist showing a plot to a bunch of people is explaining something using a statistical model, so obviously the statistical model has some explanatory power. But it's the data scientist that is using the model as an explanation. I think the discussion is whether a statistical model can exist that doesn't just "capture" an explanation, but can also make use of that explanation like a human would, for example as background knowledge to build new explanations. That seems very far fetched: a statistical model that doesn't just model, but also introspects and has agency. Anyway I find it very hard to think of language models as explanatory models. They're predictive models, they are black boxes, they model language, but what do they explain? And to whom? The big debate is that (allegedly) "we don't understand language models" in the first place. We have a giant corpus of incomprehensible data; we train a giant black box model on it; now we have a giant incomprehensible model of the data. What did we explain? >> But this is in the solution space of good models of statistical regularity of an external system. To maximally predict the next token in a sequence just requires a model of the process that generates that sequence. Let's call that model M* for clarity. The search space of models, let's call it S. There are any number of models in S that can generate many of the same sequences as M* without being M*. The question is, and has always been, in machine learning, how do we find M* in S, without being distracted by M_1, M_2, M_3, ..., ... that are not M*. Given that we have a very limited way to test the capabilities of models, and that models are getting bigger and bigger (in machine learning anyway) which makes it harder and harder to get a good idea of what, exactly, they are modelling, how can we say which model we got a hold of? |
That's the beauty of autoregressive training, the model is rewarded for capturing and utilizing explanatory relations because they have an outsized effect on prediction. It's the difference between frequency counting while taking the past context as an opaque unit vs decomposing the past context and leveraging relevant tokens for generation while ignoring irrelevant ones. Self-attention does this by searching over all pairs of tokens in the context window for relevant associations. Induction heads[1] are a fully worked out example of this and help explain in-context learning in LLMs.
>Anyway I find it very hard to think of language models as explanatory models. They're predictive models, they are black boxes, they model language, but what do they explain? And to whom?
The model encodes explanatory relationships of phenomena in the world and it uses these relationships to successfully generalize its generation out-of-distribution. Basically, these models genuinely understand some things about the world. LLMs exhibit linguistic competence as it engages with subject matter to accurately respond to unseen variations in prompts of that subject matter. At least in some cases. I argue this point in some detail here[2].
>how can we say which model we got a hold of?
More sophisticated tests, ideally that can isolate exactly what was in the training data in comparison to what was generated. I think the example of the wide variety of poetry these models generate should strongly raise one's credence that they capture a sufficiently accurate model of poetry. I go into detail on this example in the link I mentioned. Aside from that, various ways of testing in-context learning can do a lot of work here[3].
[1] https://transformer-circuits.pub/2022/in-context-learning-an...
[2] https://www.reddit.com/r/naturalism/comments/1236vzf/
[3] https://twitter.com/leopoldasch/status/1638848881558704129