| > REASON 1 This just means "simpler representations are not enough", not "good representations cannot be complex combinatorial combinations" (complex enough that it is very different to see them for a human). > REASON 2 Are you saying that I believe that the only way to get human-like text is by doing a near-infinite one-to-one mapping? This is obviously not the case. You can do, for example, a GAM time-series forecast. This can have a relatively low number of weight, and still return very sensible prediction, and yet not capture the real understanding of the phenomenon they will predict. For example, it does not understand causality, just correlation. > REASON 3 That is like saying "I've built and algorithm that is able to do 10 + 27, but there is an infinite list of number, so it is impossible for this algorithm to do 23113454453 + 1233253245". That is not true, you just decompose into (53+45), (44+32), ... and add rules to combine these elements together. It is what is happening with AI: there is enough data to get "some pattern" in the language. Just the patterns, not the understanding of the language itself. And this pattern can be reproduced in plenty of different places. > REASON 4 This argument is contradicted by "basic LLM" or even simpler model that are performing surprisingly well. Less than SOTA, but if your argument is true, CNN or ARIMAX could never provide better than a coin toss. > REASON 5 Your example is a good place where the AI will _combine_ patterns learnt from different place. It will pick characteristics of each of your scenarios, and mix them together. The result will look realistic, but it is still applying learnt pattern together. Also, you did not answered about my human arithmetic, and all your reasons are contradicted by my example there. Humans DO maths partially because they "learnt by hearth" some pattern rather than apply the understanding of fundamental arithmetic. If "answering very well based on pattern" was not a good strategy, or was necessitating infinite weights, or was making it impossible to use these patterns in novel situation, how do you explain that human can even do that themselves? As soon as we admit that humans do "some pattern some times", than we have to admit that there is a continuous spectrum and admit that it allows output that looks realistic being the result of pattern rather than understanding. By the way, I just saw a new article reaching HN: https://news.ycombinator.com/item?id=48410427 , and it is indeed explaining similar things, and illustrates that the best way for SOTA to deal with arithmetic is by "not understanding it". And yet, when you use one of those SOTA, you would be able to argue each one of your "REASON" to pretend that the model did understood arithmetic. |
I just started out with mapping to be systematic. Mapping is ground zero, then interpolation i.e. any smooth fitting function or basis, then combinatorial where different bases are recognized and then project relative to their relevance to a new input.
Each of those increase modeling efficiency and power, but even combinatorial doesn't scale to problems like language.
I may be doing a poor job communicating. A formal breakdown of the scaling issues with lower order, but scaled to make up for it, modeling would be a great paper.
To prove me wrong (as a thought experiment), choose a lower order model, any kind you can imagine that would qualify as modeling without understanding. Demonstrate it can do anything close. That it could possibly scale to the human corpus with just a trillion parameters.
If it the number of parameters goes up far too fast, then that can't be the way deep learning solves the problem with a trillion, or a few billion, either.
And consider the other side. We have no idea how our own brains are lifting up what is relevant vs. what is not. We are used to it happening. We call it "understanding". But we don't know how it works, how we work. Despite experiencing it.
What we do know, because combinatorial is too resource intensive, is we are not just combinatorial either.