|
No, the number of fingers is absolutely critical, you simply fail to understand what understanding a concept means. ChatGPT is, in this sense (and I know it’s probably a gross simplification, sorry OpenAI), a fancy markov chain. It can sample data that, by sheer correlation in the training data, reflects many features a typical drawing that matches the prompt would have. But it doesn’t understand what a human hand is, it lacks the ability to abstract, so it cannot deduce that there are (almost) always exactly 5 fingers. The probabilities reflected in the model kinda caught the aspect of a human hand being a collecion of these geometric shapes with those shades and whatnot, but they didn’t catch the hand as a coherent concept with a rather strict definition. This is like the joke about training a neural net on arithmetic where you get the wrong answer repeatedly until it remembers to answer 5+5=10. but then, until there are more data, 10+5 is also 10, because it didn’t actually understand arithmetic (to be fair, a human wouldn’t understand it by a single example either). And you can see this in action by making ChatGPT believe 5+3=7. ChatGPT manipulates symbols, and it captured the rules to do that very well. That is one of the abilities of general intelligence, but it’s not the only criterion for intelligence. You can do more than just manipulating symbols, you can also abstract over them, form your own thoughts and questions about them, be curious, reflect on your reasoning and explain it, deduct patterns from very few data because you have all the context from your previous knowledge and the abstractions built on top of it. Besides of the abstractions and functions programmed into it ChatGPT only has probabilities of symbols being related to other symbols (and the rules implied by that), but it cannot reason about these symbols and cannot form creative thought. Its „intelligence“ is limited to a finite order/level of abstraction (the features and parameters that define the model and allow it, for example, to capture shading and geometry, but not the concept of a human hand) while yours is basically limitless. You can always put another abstraction on top of what you just thought or experienced. The magic of deep learning was basically increasing the order of abstraction a neural net can capture, but it’s still limited. On the other hand, I have my pet theory that the weirdness of dreams or psychedelics arises from the brain basically sampling the connections in the brain / piping random noise through its neural net (as a side effect of all the reorganization it’s doing). |
Yet it correctly understands that faces typically have two eyes, one mouth, one nose, etc. So clearly this "lack of understanding that hands have five fingers" is unlikely to be inherent to the model.
Let's say I ask you to draw a lady bug. You'll draw a red shell with some black dots haphazardly strewn about. However, the most common lady bug in Europe always has 7 spots. It's unlikely that your drawing will reflect that. Why? Because you lack understanding of Coccinella septempunctata. But does that you mean you lack understanding in general? Of course not. Lady bugs simply aren't important to you.
So again, why are we elevating hands to be the litmus test of understanding? Yes, hands are important to humans. But this algorithm is not a human, so hands are no more important to it than anything else it can do. Like let's say if could draw perfect hands 100% of the time. Does that mean you would concede that it has understanding? I doubt it. You'd pick some other thing it didn't do well and say "See, it can't accurately draw eggs stacked in a pyramid, therefore it lacks understanding." The issue with your argument is that is a slippery slope without a specific reason why the correct rendering of hands is important.
And I'm not arguing that GPT-3 or Stable Diffusion are omnipotent. Clearly they're not. But that doesn't mean that can't understand things in their domain. As others have mentioned in adjacent comments, the only test we have for understanding, in humans or ML models, is measuring the correctness of an output for a given input. Essentially, your argument is that "It's an algorithm, it can't understand like a human," which is begging the question.
I'm not claiming that ChatGPT or any other ML algorithm is "generally intelligent." Just that it has an understanding of certain concepts.