|
|
|
|
|
by croniev
863 days ago
|
|
Would you say that GPT-4 can reason now? I am not convinced this is case, it seems like it has just become more consistent at providing us with an output that we consider reasonable because it was engineered precisely to do that. |
|
Let's assume reasoning entails going beyond the stochastic parrot level. Can LLMs have skills not demonstrated in the training set?
Here is a paper demonstrating that GPT-4 can combine up to 5 skills from a set of 100, effectively covering 100^5 tuples of skills, while only seeing much fewer combinations in training on a specific topic.
> simple probability calculations indicate that GPT-4's reasonable performance on k=5 is suggestive of going beyond "stochastic parrot" behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training https://arxiv.org/abs/2310.17567
So they show ability to freely combine skills, and the limit of k=5 measured in this benchmark illustrate that models do generalize. They are able to apply skills in new combinations correctly, but there is also a limit.
The interesting part is how they demonstrate that, let's say on a topic with n=1000 samples in the training set it is impossible to have sufficient training examples covering tuples of 5 skills, but models (mostly GPT-4) can handle it. Other models top out at tuples of only 2 or 3 skills.
Models combining skills in new ways are not just parroting. They can perform meaningful work outside their training distribution.