Y
Hacker News
new
|
ask
|
show
|
jobs
by
nahnahno
218 days ago
The fact that GPT-4.1 was the judge does not convince of the validity of the bench.
2 comments
ripped_britches
217 days ago
It’s probably just that they started before gpt 5 was released. It’s a good judge.
link
tacoooooooo
218 days ago
it's an odd choice. I'd be curious why they picked that. it's not the cheapest, most expensive, best, or worst.
It does have a relatively large context window, and ime is very good at format adherence
link
lukaslevert
217 days ago
You may be looking at our first benchmarks on the homepage— the latest ones for the Search API were conducted against GPT-5:
https://parallel.ai/blog/introducing-parallel-search
link