Y
Hacker News
new
|
ask
|
show
|
jobs
by
elahieh
40 days ago
One reason might be that Claude Opus 4.7 thinking benchmarks better on Arena Coding at
https://arena.ai/leaderboard/text/coding
... hopefully that effectively assesses correctness. It doesn't account for reliability though.