|
The "language models don't really understand anything" corner is getting smaller and smaller. In the last few months we've seen pretty definitive evidence that transformers can recombine concepts ([1], [2]) and do simple logical inference using contextual information ([3], "make the score font color visible"). I see no reason that this technology couldn't smoothly scale into human-level intelligence, yet lots of people seem to think it'll require a step change or is impossible. That being said, robust systematic generalization is still a hard problem. But "achieve symbol grounding through tons of multimodal data" is looking more and more like the answer. [1] https://openai.com/blog/dall-e/
[2] https://distill.pub/2021/multimodal-neurons/
[3] https://openai.com/blog/openai-codex/ |
In my mind, understanding a thing means you can justify an answer. Like a student showing their work and being able to defend it. An answer with a proof understands the answer with respect to the proof it provides. E.g. to understand an answer with regards to first order logic, it'll have to be able to defend a logical deduction of that answer.
These models still can't justify their answers very well, so I'd say they're accurate but only understand with respect to a fairly dumb proof system (e.g. they can select relevant passages or just appeal to overall accuracy statistics). They're still far from being able to justify answers in the various ways we do, which I'd say means that by definition that they still don't understand with regards to the "proof systems" that we understand things with regards to.
Maybe the next step will require increasingly interesting justification systems.