|
|
|
|
|
by malisper
444 days ago
|
|
Other models aren't able to solve it so there's something else happening besides it being in the training data. You can also vary the problem and give it a number like 85 instead of 65 and Gemini is still able to properly reason through the problem |
|
There are lots of possible mechanisms by which this particular problem would become more prominent in the weights in a given round of training even if the model itself hasn't actually gotten any better at general reasoning. Here are a few:
* Random chance (these are still statistical machines after all)
* The problem resurfaced recently and shows up more often than it used to.
* The particular set of RLHF data chosen for this model draws out the weights associated with this problem in a way that wasn't true previously.