|
|
|
|
|
by cristiancavalli
497 days ago
|
|
This looks neat but I don’t think it meets the standard for “reasoning only.” (Still not sure how you would prove that one) furthermore this looks to be fairly generalizable in pattern+form to other grid problems so i don’t think it also meets the bar for “not being in the training data.” We known these models can generalize somewhat based upon their training but not consistently and certainly not consistently well. Again I’m not making the claim that responding to a novel prompt is a sign of reasoning as other have pointed out a calculator can do that too. Your quote:
“This is a unique problem I came up with. It’s a variation on counting islands.”
You then say:
“ as I came up with it so no variation of it really exists anywhere else.” So not sure what to take away from your text but I do think this is a variation of a well-known problem type so I would be pretty amazed if there was something very close to this in the training data. Given it’s an interview question and those are written about ad-nauseum I’m not surprised then that it was able to generalize to the provided case.
The COT researchers did see the ability to generalize in some cases just not necessarily actually use the COT tokens to reason and/or failed on generalizing on variations which they thought it should have given its ability to generalize in others and the postulation that it was using reasoning and not just a larger corpus to pattern match with. |
|
The solution however is not a variation. It requires leaps of creativity that most people will be unable to solve. In fact I would argue this goes beyond just reasoning as you have to be creative and test possibilities to even arrive at a solution. It’s almost random chance that will get you there. Simple reasoning like logical reduction won’t let you arrive at a solution.
Additionally this question was developed to eliminate pattern matching that candidates use on software interviews. It was vetted and verified to not exist. No training data exists.
It definitively requires reasoning to solve. And it is also unlikely you solved it. ChatGPT o3 has solved it. Try it.