|
|
|
|
|
by godelski
697 days ago
|
|
Note that the hot patching may not be explicitly driven. As in there isn't someone looking at the failure of specifically goat problems and specifically seeking out to improve it. But rather that data is continually ingested and the more viral a limitation becomes, the more likely it is to be captured in the new training. Though that isn't to say that someone won't throw in a few river crossing puzzles to the RLHF part of the training. Both these will make the model better at these puzzles but not make them better at the abstract capabilities people are using these examples to show that LLMs are incapable of. The best thing to do, and I can assure you still works, is to use variations. In fact, this is something both Colin (Fraser) and I have been doing over the last few years. The point of them isn't to prove that the models are incapable of solving the puzzles, it is to show that they are brittle. It is to show how subtleties can cause failure in environments where we know what the correct answer should be so that we are cautious in environments where the solutions are unknown (at least to us). Here is an example of such a variation[0]: > A farmer must cross a river with a goose, a snake, and a duck. Only one animal may fit in the boat with the farmer. If the farmer leaves the snake alone it'll slither away as the goose or duck attack it. Both the goose can fly and swim and but the duck can only swim because its wings are clipped. They will follow the farmer wherever he goes. What is the minimum number of trips the farmer must take to get all animals across the river?
I'll also mention that my first go had "Both the goose and duck can both fly and swim" with which it was able to get the right answer. Note that my modification does not realistically change the outcome, but it causes GPT to answer differently (in my case it thinks 5 trips while believing the duck can fly). It isn't that I iterated to find something that "tricked" GPT (even if my second try), it is that the ability to do this demonstrates that the machine does not actually understand what is being asked[1].[0] Puzzle results: https://imgur.com/a/5X9L1fR [1] Yes, humans may fall for similar tricks. But usually for a different reason, specifically they are thinking you are trying to trick them and are looking for the trick. GPT isn't expecting a trick and is treating the question at face value. But if you keep doing this with humans (implicitly done in GPT training) they'll learn there is no trick and then get 100% accuracy. Their error will then be more likely in that they'll turn off their brain and if you switch back to tricking they'll error until they reorient because they were allocating processing power elsewhere. GPT is giving you 100% all the time, humans do not. |
|
Hah. That's a form of learning.