| >They're not able to reason, but we can't [succinctly] define what it is. People also routinely fail to reason, even programmers often write "obvious" logic bugs they don't notice until it gives an unexpected result at which point it's obvious to them. So both humans and AI don't always reason. But humans reason much better. I myself have observed ChatGPT 4 solving novel problems I invented to my personal satisfaction well enough to say that it seems to have a rudimentary ability to sometimes show abilities we would typically call reasoning, but only at the level of a child. The issue isn't that it is supposed to reason perfectly or that humans reason perfectly, the issue is that it doesn't reason well enough to succeed at completing many kinds of tasks we would like it to succeed at. Please note that nobody expects it to reason perfectly. "Prove Fermat's last theorem in a rigorous way. Produce a proof that can be checked by Coq, Isabelle, Mizar, or HOL in a format supported directly by any of them" is arguably a request that includes nothing but reasoning and writing code. But we would not expect even Wiles to be able to complete it, and Wiles has actually proved Fermat's last theorem. So we have an idea of reasoning as completing certain types of tasks successfully, and today humans can do it and AI can't. Today, it fails badly at tasks that require reasoning. A simple example: https://chatgpt.com/share/da95843e-218a-4d69-a161-6aa2d7a3c9... The issue is that humans can see its answer is wrong and its "reasoning" is wrong. The issue isn't that it never reasons correctly. It's that it doesn't do so often enough or well enough, and it doesn't complete tasks we expect humans to complete, and it doesn't always notice when it is printing something outrageously wrong and illogical. It notices sometimes, it engages in elementary rudimentary guesswork sometimes, but just not often enough or well enough. |
> The issue is that humans can see its answer is wrong and its "reasoning" is wrong.
I've noticed with LLMs that they're more likely to come to the wrong conclusion if you prime them in that manner. In this case, you posed the follow-up question as "Will <incorrect conclusion> always be true?" As a result, it's primed to try to prove that incorrect conclusion.
(That said, ChatGPT further did not answer the posed question, as it also changed "difference" -> "absolute difference"; in fact, the difference will alternate between increasing and decreasing, while the absolute difference is strictly increasing.)