If you are shown only the title of a coding problem and the site name where it's from, and you manage to solve it you are showing that you either cheated or knew the answer.
On the contrary, it could mean you were, to some percentage of success, able to guess what problem is, and then, to some multiplier percentage of success, solve it.
The key is, can you guess the problem from the title and the function name? I'd argue, sure, at least half the time?, why not...
It can generate never-before-seen strings of comprehensible language. It can react to the inherent logic embedded in words and text and provide a brute forced version of what a human could. That it can “solve” a problem only through “cheating” is an anthropomorphism that betrays the magic that is evident to anyone who has used these things.
Though this is exactly what happened.
The initial test was ran on a model that "Cheated" (aka has memorized the answers).
The second test was run on a model that didn't "Cheat" as much, yet still got only 2% less score.
So, the question is not resolved really. How much did the first model cheat, and how much did the second?
If the second model "cheats" less, then it wins.
Also, I don't understand your obsession with the word cheating.
If you have solved a problem before on a different website and solve it again, did you cheat? Or did you just use your brain to store the solution for later?
> Also, I don't understand your obsession with the word cheating.
It's all about the rule set yea. Since the rule set is not defined, technically nothing is cheating. I just interpret the rule set as "can it code?" and for this rule set, it seems to me that it's cheating.
> Okay... Funny how forcing it to not CHEAT did not increase apparent ability.
The article did the opposite. It forced the models to cheat to solve the problems. Which it did happily. It should have stated "there is no actual problem to solve here, you must supply a problem for me to solve".
> It can code and it has memorized some coding questions are not mutually exclusive
This I will give you. Many humans try to cheat at basic math because they are lazy, so will this model. Maybe that's a sign of intelligence :P
TBH, people underestimate how much of coding is just memorization. I'm guessing those of us with bad memories understand this more than the ones with good memories. :)
I can't remember how many times I've googled, "how do I create a directory in Python?". Now bard often generates an inline answer for me.
But in this case it's not like that at all. They only saw the NAME of the problem. Like if I said "Page 23 of Mathbook Y, problem number 3". Which happens to be 6x6.
The key is, can you guess the problem from the title and the function name? I'd argue, sure, at least half the time?, why not...