| There's a very important difference, though: humans can and often do qualify their interpretation. One major flaw of current PAC-based models is the inability to express certainty and being able to present the rationale behind their (lack of) confidence. One crucial aspect of human language processing is the ability to question or interrogate oneself to adjust one's interpretation. During a dialog this can be done by actively asking for more context or confirmation. So unlike current ML models, humans know that their interpretation/models might be wrong and can actively seek confirmation or correction. A trivial example would be asking a digital assistant "What's the capital of Georgia?"
There are two possible answers and there's no way of knowing what the "correct" one is without further context. So a human would likely either ask whether you mean the country of Georgia or the US state, or qualify their answer ("the capital of the country/the state capital is Tbilisi/Atlanta"). This is what GPT-J-6B returns: Q: What's the capital of Georgia?
A: Atlanta
When given just the prompt "What's the capital of Georgia", GPT-J-6B also returns Atlanta and generates further text about the US state and its capital.And it's not as if the model doesn't "know": Q: What is Tbilisi?
A: Tbilisi is the capital of Georgia.
It's even more interesting when logic is involved: Q: If Albert is a pig and all pigs are blue, what colour is Albert?
A: Albert is a blue pig.
I only asked for the colour, but close enough. Can GPT deal with ambiguity? Q: If Albert is a pig and all pigs are blue, is Albert happy?
A: No.
Awesome! How about some common sense then: Q: If Alice is a dog and Bob just finished high school, can they have kids together?
A: Yes, but they will have to get married first.
So yeah, GPT obviously has no actual language understanding. Bob having finished high school implies that he's a young human man while Alice is a dog. Humans and dogs can't have kids together.But if the model doesn't understand language, why did it do so well on the logic test? In order to find out, I just asked :) Q: If Albert is a pig and all pigs are blue, why is Albert not happy?
A: Because he is a pig.
Hm. So are all pigs unhappy then? Let's take them out of the equation and ask again: Q: If Albert is blue, is Albert unhappy?
A: No, Albert is not unhappy.
Bingo! So the model only appeared to know the subtlety of "blue" having multiple meanings. It's as dumb as a rock and simply tried to match the tokens "is blue" against "is unhappy". Replacing "blue" with "pig" confirms this: Q: If Albert is a pig, is Albert unhappy?
A: No, Albert is not unhappy. Albert is a pig.
Dad humour. Well, either that or just a dumb model capable of fooling us humans because we tend to interpret more intelligence into its replies than is actually there...EDIT: in case you want to test this yourself (you might get different results, it's probabilistic after all): https://6b.eleuther.ai When using the Q/A-format, make sure not add a space after the "A:" prompt. |
–––––––––––––––––––––––––––––––––
–––––––––––––––––––––––––––––––––So does it understand after all? Or is trying different prompts a form of p-hacking here?