|
|
|
|
|
by spongebobstoes
317 days ago
|
|
> the “minimal” GPT-5 variant ... achieved a score of 58.5 the image shows it with a score of 62.7, not 58.5 which is right? mistakes like this undermine the legitimacy of a closed benchmark, especially one judged by an LLM |
|