I won't define reasoning, just call out one aspect.
We have the ability to follow a chain of reasoning, say "that didn't work out", backtrack, and consider another. ChatGPT seems to get tangled up when its first (very good) attempt goes south.
This is definitely a barrier that can be crossed by computers. AlphaZero is better than we are at it. But it is a thing we do which we clearly don't simply do with the probabilistic regurgitation method that ChatGPT uses.
That said, the human brain combines a bunch of different areas that seem to work in different ways. Our ability to engage in this kind of reason, for example, is known to mostly happen in the left frontal cortex. So it seems likely that AGI will also need to combine different modules that work in different ways.
On that note, when you add tools to ChatGPT, it suddenly can do a lot more than it did before. If those tools include the right feedback loops, the ability to store/restore context, and so on, what could it then do? This isn't just a question of putting the right capabilities in a box. They have to work together for a goal. But I'm sure that we haven't achieved the limit of what can be achieved.
these are things we can teach children to do when they don't do it at first. I don't see why we can't teach this behavior to AI. Maybe we should teach LLM's to play games or something. or do those proof thingys that they teach in US high school geometry or something like that. To learn some formal structure within which they can think about the world
It feels like humans do do a similar regurgitation as part of a reasoning process, but if you play around with LLMs and ask them mathematical questions beyond the absolute basics it doesn’t take long before they trip up and reveal a total lack of ‘understanding’ as we would usually understand it. I think we’re easily fooled by the fact that these models have mastered the art of talking like an expert. Within any domain you choose, they’ve mastered the form. But it only takes a small amount of real expertise (or even basic knowledge) to immediately spot that it’s all gobbledygook and I strongly suspect that when it isn’t it’s just down to luck (and the fact that almost any question you can ask has been asked before and is in the training data). Given the amount of data being swallowed, it’s hard to believe that the probabilistic regurgitation you describe is ever going to lead to anything like ‘reasoning’ purely through scaling. You’re right that asking what reasoning is may be a philosophical question, but you don’t need to go very far to empirically verify that these models absolutely do not have it.
On the other hand, it seems rather intuitive we have a logic based component? Its the underpinning of science. We have to be taught when we've stumbled upon something that needs tested. But we can be taught that. And then once we learn to recognize it, we intuitively do so in action. ChatGPT can do this in a rudimentary way as well. It says a program should work a certain way. Then it writes it. Then it runs it. Then when the answer doesn't come out as expected (at this point, probably just error cases), it goes back and changes it.
It seems similar to what we do, if on a more basic level. At any rate, it seems like a fairly straight forward 1-2 punch that, even if not truly intelligent, would let it break through its current barriers.
LLMs can be trained on all the math books in the world, starting from the easiest to the most advanced, they can regurgitate them almost perfectly, yet they won't apply the concepts in those books to their actions. I'd count the ability to learn new concepts and methods, then being able to use them as "reasoning".
Aren't there quite a few examples of LLMs giving out-of-distribution answers to stated problems? I think there are two issues with LLMs and reasoning:
1. They are single-pass and static - you "fake" short-term memory by re-feeding the questions with it answer
2. They have no real goal to achieve - one that it would split into sub-goals, plan to achieve them, estimate the returns of each, etc.
As for 2. I think this is the main point of e.g. LeCun in that LLMs in themselvs are simply single-modality world models and they lack other components to make them true agents capable of reasoning.
We have the ability to follow a chain of reasoning, say "that didn't work out", backtrack, and consider another. ChatGPT seems to get tangled up when its first (very good) attempt goes south.
This is definitely a barrier that can be crossed by computers. AlphaZero is better than we are at it. But it is a thing we do which we clearly don't simply do with the probabilistic regurgitation method that ChatGPT uses.
That said, the human brain combines a bunch of different areas that seem to work in different ways. Our ability to engage in this kind of reason, for example, is known to mostly happen in the left frontal cortex. So it seems likely that AGI will also need to combine different modules that work in different ways.
On that note, when you add tools to ChatGPT, it suddenly can do a lot more than it did before. If those tools include the right feedback loops, the ability to store/restore context, and so on, what could it then do? This isn't just a question of putting the right capabilities in a box. They have to work together for a goal. But I'm sure that we haven't achieved the limit of what can be achieved.