|
|
|
|
|
by veber-alex
54 days ago
|
|
In my experience benchmarks are pretty meaningless. Not only is performance dependent on the language and tasks gives but also the prompts used and the expected results. In my own internal tests it was really hard to judge whether GPT 5.5 or Opus 4.7 is the better model. They have different styles and it's basically up to preference. There where even times where I gave the win to one model only to think about it more and change my mind. At the end of the day I think I slightly prefer Opus 4.7. |
|
It's a strong signal for a job, but the soft skills are sometimes going to get Claude Opus 4.6 a job over smarter applicants. That's what we'd really like to measure objectively, and are actively working on.