Hacker News new | ask | show | jobs
by spongebobstoes 317 days ago
> the “minimal” GPT-5 variant ... achieved a score of 58.5

the image shows it with a score of 62.7, not 58.5

which is right? mistakes like this undermine the legitimacy of a closed benchmark, especially one judged by an LLM

2 comments

A large chunk of this article reads like LLM generated, so I guess it was never proofread, and details like this are not validated, or they could be entirely made up i.e. hallucinated.
Probably written by an llm too…