|
|
|
|
|
by seanhunter
943 days ago
|
|
Actual reasoning shows the understanding and use of a model of the key features of the underlying problem/domain. As a simple example that you can replicate using chatgpt, ask it to solve some simple maths problem. Very frequently you will get a solution that looks like reasoning but is not, and reveals that it does not have an actual model of the underlying maths but is in fact doing text prediction based on a history of maths. For example see here[1]. I ask it for some quadratics in x with some specification on the number of roots. It gives me what looks at first glance like a decent answer. Then I ask the same exact question but asking for quadratics in x and y[2]. Again the answer looks plausible except that for the solution "with one real root" it says the solution has one real root when x + y =1. Well there are infinite real values for x and y such that x + y =1, not one real root. It looks like it has solved the problem but instead it has simulated the solving of the problem. Likewise stacking problems, used to check for whether an AI has a model of the world. This is covered in "From task structures to world models: What do LLMs know?"[3] but for example here[4] I ask it whether it's easier to balance a barrel on a plank or a plank on a barrel. The model says it's easier to balance a plank on a barrel with an output text that simulates reasoning discussing center of mass and the difference between the flatness of the plank and the tendency of the barrel to roll because of its curvature. Actual reasoning would say to put the barrel on its end so it doesn't roll (whether you put the plank on top or not). [1] https://chat.openai.com/share/64556be8-ad20-41aa-99af-ed5a42... [2] https://chat.openai.com/share/2cd39197-dc09-4d07-a0d6-6cd800... [3] https://arxiv.org/abs/2310.04276 [4] https://chat.openai.com/share/4b631a92-0d55-4ae5-8892-9be025... |
|
If you were to ask the same question of a real person and they replied with the exact same answer you could not conclude that person was not capable of "actual reasoning". It's a bit of witch-hunt question set to give you the conclusion you want.