Hacker News new | ask | show | jobs
by filleduchaos 94 days ago
I would in fact expect any human that's as good at writing code as various state-of-the-art LLMs (if you take the breathless proclamations of their hype bros at face value) to be able to solve the rather simple problems in the benchmark given the relevant esolang spec and some time to figure it out.

It's not as if the models here were asked to write a kernel in Brainfuck; the medium tier of problems here contains such apparently insurmountable tasks as "calculate the nth prime".