Hacker News new | ask | show | jobs
by izzygonzalez 1219 days ago
Abstract:

Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training.

Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children.

These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills.

1 comments

> These findings suggest that ToM-like ability (thus far considered to be uniquely human)

What it suggests to me is that the particular test of “Theory of Mind” tasks involved actually test the ability to process language and generate appropriate linguistic results, not theory of mind.

It also suggests (with the “thus far considered to be uniquely human”) that the authors are unaware of other theory of mind tests that have been used that are not language dependent but behavior dependent, and on which, while, as is also true of linguistic tests, the validity of the tests is controversial – a number of non-human primates, non-primate mammals, and even some birds (parrots and corvids, particulary) have shown evidence of theory of mind.

It's hard to look at behaviour separately from language if the only behaviour available is to generate text. As long as we don't have a test agnostic of medium, this will have to do.

In the end, we can't overcome the limitation that all we can empirically see is the ability to process X and generate appropriate Y. If that invalidates the test where X is language and Y is language, what stops us from invalidating any possible X and Y? That would leave us no empirical method to work with.

We cannot assume that, because text generation is all these models do, then it must be possible to get answers to the questions we want to ask by examining their textual responses.

It is fair to ask why, if we accept these verbal challenges as good evidence for a theory of mind in children, we would not accept them for these models, but children have nothing like the memory for text that these models have, and the corpus of text that these models have been trained on includes a great many statements that tacitly represent their authors' theory of mind (i.e. they are the sort of statements that would typically be made by someone having a theory of mind, just as arithmetically-correct statements concerning quantities are to be expected from people who know arithmetic.)

To be clear, I am not arguing that it would be impossible to show a theory of mind in a system that can only interact through text, but personally, I think it will require a model with greater capabilities than responding to prompts. For example, when models can converse among themselves, I think we will know.

> To be clear, I am not arguing that it would be impossible to show a theory of mind in a system that can only interact through text

I think you are, because

> a model with greater capabilities than responding to prompts

interacts in other ways than text.

Even then, I don't see what's so special about language that it needs to be separated from other ways of interaction. If language is not enough to derive empirical answers, why should physical movements or radio emissions be?

Even if you don't assume that it's necessarily impossible to get the answers empirically for a text-based model, you must keep in mind that that option is open. Perhaps we will never find out if language models have a theory of mind.

However, judging by the discussions around the topic, very few people highligh the unknowability. If I have to choose between "yes" or "no" while the reality is "maybe", I'd choose a "yes" purely out of caution.

Two models having a coherent conversation - a scenario which follows directly from my post - would be a purely textual example of what I mean.

> Perhaps we will never find out if language models have a theory of mind.

We appear to be in agreement here.

When the state of our knowledge is 'maybe', it seems rash to assume either 'yes' or 'no'.

What does it change when you add another model? I don't see how this lets us extract extra information.

What distinguishes two conjoined models from one model with a narrowing across the middle?

If the idea is to have two similar minds building a theory of each other, then I guess this could be informative, but first we'd have to establish that the models are "minds" in the first place. It's not clear to me what that requires.