|
|
|
|
|
by cauch
11 days ago
|
|
Two problems with that. Firstly, how do you know that the optimal way to highly compress complex information is to understand it? You think it is obvious because you are very familiar with "understanding" as a way to summarise complex information. But there can be billions of different ways, outside of human imagination, that is as good or even better. But secondly, LLM don't find the optimal way, they find the local minimum. Everyone who worked with NN knows that they are prone to come up with spurious pattern, incorrect correlations and bad workaround to guess the correct answer. You regularly need to nudge the NN by creating specifically engineered features to avoid them to fall into the first local minimum. When it comes to LLM, it is extremely complicated to control to see if the LLM has triggered on a misleading pattern that, by chance, links two "tokens" together, or on a real concept that indeed links two "tokens" together. Basic probability implies that there are probably tons of "fake patterns" engraved into the weight during the LLM training, "fake patterns" that should not exist if there was any kind of "understanding" of the abstract mechanism that links these tokens. |
|
What is your non-performance baseline for "Understanding"? We don't have such a measure for humans.
Understanding is the behavioral ability demonstrated by learning to model something complex well. Beyond mappings, associations, interpolations.
Models clearly do. Mix up the most unlikely combination of non-trivial subjects, and they response sensibly. Those are not averaged, interpolated by any order, or even combinatorially interactions.
There is a reason those kinds of encodings, mappings, associations, interpolations, statistics / stochastics, all failed miserably for decades. Still fail. It took topological transforms, reminiscent of how we compute (dendrite-soma-axon, tensor-sum-nonlinear), and then they lept several orders of magnitude ahead of any alternative.
The problem with models composed of relationships of lower order than the phenomena they are trying to model, is they require combinatorially more parameters to model anything complex.
For simple problems, poor models fail gracefully. For complex problems, poor models just fail.