| A comment from r/machinelearning: "It seems obvious from the demos that GPT-3 is capable of reasoning. But not consistently. It would be critical, imo, to see if we can identify a pattern of activity in it associated with the lucid responses vs activity when it prodcues nonsense. If/when we have such a apattern we would need to find a way to enforce it to happen in every interaction" And people agree: "Dunno why you are getting downvoted, I agree with you. It seems like to get GPT-3 to do good reasoning you have to convince it that it is writing about a dialogue between two smart people. Talking to Einstein, giving some good examples, etc. all seem to help. Shaping really seems to matter, but I don’t think we have enough access to the hidden state to determine if there are quantitative differences between when it is more lucid and when it isn’t. It’s like Gwern said: “sampling cannot prove the absence of knowledge, only the presence of it” (because whenever it fails, maybe with a different context, different sampling parameters, using spaces between letters, etc. it would have worked)" Its interesting that this kind of speculation is entering the conversation. I think we are on the cusp |
The fact that there could be reasoning going on is certainly exciting by itself. But I don't think it's fair to call it obvious without a compact specification for how to make GPT-3 perform a general class of reasoning. Less "here's a script to make it output stuff about balanced parens", more "here's a strategy to teach it most basic string manipulations".