| HN Mirror

I can't suggest a new test no, it is a hard problem and identifying problems is usually easier than solving them.

I'm just trying to say that strong claims require strong evidence, and a claim that LLM's can have theory of mind and thus "understand that other people have different beliefs, desires, and intentions than you do" is a very strong claim.

It's like giving students the math problem of 1+1=2 and loads of examples of it solved in front of them, and then testing them on you have 1 apple, and I give you another apple, how many do you have, and then when they are correct saying that they can do all additive based arithmetic.

This is why most benchmark tests have many many classes of examples, for example looking at current theory of mind benchmarks [1], we can see slightly more up to date models such as o1-preview still scoring substantially below human performance. More importantly by simply changing the perspective from first to third person, accuracy drops in LLM models by 5-15% (percent score, not relative to its performance), whilst it doesn't change for human participants, which tells you that something different is going on there.

[1]: https://arxiv.org/html/2410.06195v1