Hacker News new | ask | show | jobs
by srijanshukla18 50 days ago
This is not a serious benchmark, come on.

Tomorrow I'm launching a benchmark where I check if an LLM can build a Airbus A320 from scratch without internet. (Spoiler: no LLM succeeds)

1 comments

Preinternet people would routinely re-implement unix and get shell scripts working across systems. This benchmark shows that agentic LLMs can't even do that, not just for complex programs and scripts, but for simple programs and simple scripts. 0%. Which fits with claudes' inability to write a c compiler.