| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by spongebobstoes 317 days ago

> the “minimal” GPT-5 variant ... achieved a score of 58.5

the image shows it with a score of 62.7, not 58.5

which is right? mistakes like this undermine the legitimacy of a closed benchmark, especially one judged by an LLM

2 comments

rs186 317 days ago

A large chunk of this article reads like LLM generated, so I guess it was never proofread, and details like this are not validated, or they could be entirely made up i.e. hallucinated.

link

jama211 317 days ago

Probably written by an llm too…

link