| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nahnahno 265 days ago
	The fact that GPT-4.1 was the judge does not convince of the validity of the bench.

2 comments

It’s probably just that they started before gpt 5 was released. It’s a good judge.

it's an odd choice. I'd be curious why they picked that. it's not the cheapest, most expensive, best, or worst.

It does have a relatively large context window, and ime is very good at format adherence

You may be looking at our first benchmarks on the homepage— the latest ones for the Search API were conducted against GPT-5: https://parallel.ai/blog/introducing-parallel-search