|
|
|
|
|
by famouswaffles
1134 days ago
|
|
>but they can only reason using their memories and the prompt. Eh no. https://arxiv.org/abs/2212.10559 >But if you try very hard you can find "held out" data and when you test on it, GPT4 stops looking so smart: This can be done to anybody. This can be done to you. It's not a gotcha. Nobody is saying GPTs don't/can't memorize. |
|
1. the paper in question demonstrates a formal duality between the transformer architecture and gradient descent. If you take this to indicate that the model reasons in some way, then it would be true of the smallest GPT as well as the largest (it is, after all, a consequence of the architecture rather than anything the model has learned to do per se). In any case, the fact that the model can perform the equivalent of a finite number of gradient-like steps on its way to calculating its final conditioned probabilities doesn't really suggest to me that the model reasons in a general way.
2. You are right that no one disputes the model's ability to memorize (and rephrase). What is at question here is whether the model can reason. If it can do 10 code questions it has seen before but fails to do 10 it hasn't (of similar difficulty) then it strongly suggests that it isn't reasoning its way through the questions, but regurgitating/rephrasing.