Hacker News new | ask | show | jobs
by nahnahno 218 days ago
The fact that GPT-4.1 was the judge does not convince of the validity of the bench.
2 comments

It’s probably just that they started before gpt 5 was released. It’s a good judge.
it's an odd choice. I'd be curious why they picked that. it's not the cheapest, most expensive, best, or worst.

It does have a relatively large context window, and ime is very good at format adherence

You may be looking at our first benchmarks on the homepage— the latest ones for the Search API were conducted against GPT-5: https://parallel.ai/blog/introducing-parallel-search