| > Well, I cannot do that, because for you, if it looks realistic, it has to have understanding. Yes, if it consistently produces good output for highly varied stimuli that can be intentionally picked to have been unlikely to ever had obvious representation in the training set, then yes it understands. I think we are talking past each other a bit. A series of increasingly challenging datasets, used to capture scaling efficiencies, would ground our discussion. But the level of performance for models is simply too good vs. the number of parameters to be doing anything trivial. Deep learning models do something combinatorial models do not. The linear tensor + non-linear transforms do two special things: 1. The tensor itself just projects a linear space into higher dimensions, but its still the same information space. Project a 2D surface into higher dimensions linearly, and there can be more parameters, but it is not more information, since there is an expansion of linear dependence to match. 2a. But then the nonlinear both (a) thresholds, squashes or otherwise alters the linear results, in a way that removes linear dependencies, increasing the useful dimensionality of the representation. 2b. And the squashing also allows dimensions to be folded down. So by both expanding and flattening representational dimensions, deep learning models are able to model higher-order relationship directly, that any less expressive modeling would require cobbling together many patches of fitting. Another way to put this, is deep learning models are able to learn higher-order relationships directly, not be memorizing and interpolating across learned points or regions. So a dramatically greater ability to "understand" is why deep learning models are so much better. They are not doing simple combinatorial fitting. "Understanding" or not, combinatorial relationships are the low bar for deep learning models, they are inherently great a learning much higher-order relationships. I am falling asleep at this point. I feel like we need a blackboard and a computer. You are saying a lot of things that make me think, and make sense to me. |
You keep saying "what I observe with GenAI can only be the result of 'understanding'" without providing any proofs at all. Just few beliefs.
You just say "look at this behavior, that's the proof". I truly don't think it is: nothing proves that this behavior requires 'understanding'. And nothing you provided helps: all you provided are impressive behaviors and then the unsubstantiated conclusions "and this behavior can only be done with understanding".
At the same time, there are too much clues showing that such behavior does not require understanding, even if it _looks_ incredibly clever:
1. GenAI does not understand (after the training phase) things that humans don't understand. If GenAI had the capacity of building an understanding during training, then there is no reason this understand will coincide with human understanding.
2. Optimisation does not always lead to "understanding". Human brains choose to optimise "learning multiplication table by heart" rather than building a pocket calculator inside the neurons.
3. Human brains, that have "understanding", are working fundamentally differently from GenAI (flow of thoughts, intrinsically intertwined memory and compute, optimised for world-model treatment rather than token treatment, ...). It is an unsubstantiated jump to simply conclude AI has "understanding", while it can be the result of fundamental differences.
4. "Basic" LLM are surprisingly good at creating convincing sentence and yet there are situations where it is blatantly clear they did not understood anything. More advanced SOTA are based of refinement of "basic LLM", and therefore the "sentence construction that is done without understanding" is still used, and impair the SOTA model to build a full understanding.
> Another way to put this, is deep learning models are able to learn higher-order relationships directly, not be memorizing and interpolating across learned points or regions.
It's exactly what I'm saying: deep learning models are very good at learning complex relationships. Such as "I don't know what 'Paris' is, I don't have any understand of what a city is in reality, but when the token Paris is associated with these other tokens in this complex order, even if I never saw it before, I have learnt the complex relationships and therefore I'm able to build a series of token".
They are very good at learning complex relationship that allows them to choose the correct combination even if they did not "understand" the content of the correct combination.
I understand that it is impressive: those relationships are very complex and very numerous (there are billions of them). It is easier to do anthropomorphism and conclude that the AI has "understood".
But again, the main problem is that you just pretend, without any proof, "no, I cannot believe that, I refuse to believe that".
(and, by the way, I personally think that AI (SOTA but also even "basic LLM") do have 'rules' that correspond to some kind of understanding of basic mechanism. I think they have basic "world models". But these world models are optimised "to write text" rather than to "understand the world", and therefore the large majority of AI output is just not-understood token chains)