|
|
|
|
|
by famouswaffles
1126 days ago
|
|
If code transfers to reasoning tasks that don't have anything to do with code then what is being "substituted" ? Ideas and concepts ? Code and MMLU don't share similar "reasoning patterns" unless you're being extremely vague. In the, "they both require reasoning" sense. |
|
I won't say these models can't reason per se, but they can only reason using their memories and the prompt. There is nothing else for them to compute on.
In a hand wavy kind of way, when ChatGPT fails at a riddle phrased in a way as to make it seem similar to a common riddle, you're seeing overfitting. But given the quantity of data these models consume, its hard to imagine how to test for overfitting because the training data contains things similar to almost anything you can imagine. Because of that I'm still very suspicious of claims that they "reason" in any strong sense of the word.
But if you try very hard you can find "held out" data and when you test on it, GPT4 stops looking so smart:
https://teddit.net/r/singularity/comments/121tc48/gpt4_fails...
That said, I've been very impressed by GPT4 as a productivity tool.