It does not make my point moot however. Take a look at the ARC challenge. Simple reasoning tasks that the models have not yet seen: https://arcprize.org/play?task=00576224
All models fail miserably on this, because they rely more on memorization and less on logic or reasoning. Simply cherry picking strikingly good responses like the author did proves nothing about model intelligence. I am pretty confident however, that after a couple tries a highschooler could do these types of tasks without issue.
It does not make my point moot however. Take a look at the ARC challenge. Simple reasoning tasks that the models have not yet seen: https://arcprize.org/play?task=00576224
All models fail miserably on this, because they rely more on memorization and less on logic or reasoning. Simply cherry picking strikingly good responses like the author did proves nothing about model intelligence. I am pretty confident however, that after a couple tries a highschooler could do these types of tasks without issue.