We do do genuine reasoning. It would take a lot of practice for us to learn it but we also use less "electricity" to do it.
The thing about LLMs is there doesn't seem to be a way to teach them genuine reasoning. You can spend a month teaching an LLM brainfuck and it would likely still fail at a novel problem. Whereas if a human studied brainfuck for a month they would probably be quite competent at a novel problem
I would in fact expect any human that's as good at writing code as various state-of-the-art LLMs (if you take the breathless proclamations of their hype bros at face value) to be able to solve the rather simple problems in the benchmark given the relevant esolang spec and some time to figure it out.
It's not as if the models here were asked to write a kernel in Brainfuck; the medium tier of problems here contains such apparently insurmountable tasks as "calculate the nth prime".
No. I’m just an NPC in someone else’s simulation. Wandering the world aimlessly incapable of expressing ideas outside of my training corpus of language. Pathetic.
The thing about LLMs is there doesn't seem to be a way to teach them genuine reasoning. You can spend a month teaching an LLM brainfuck and it would likely still fail at a novel problem. Whereas if a human studied brainfuck for a month they would probably be quite competent at a novel problem