Hacker News new | ask | show | jobs
by jcuenod 376 days ago
82.2 on Aider

Still actually falling behind the official scores for o3 high. https://aider.chat/docs/leaderboards/

3 comments

Does 82.2 correspond to the "Percent correct" of the other models?

Not sure if OpenAI has updated O3, but it looks like "pure" o3 (high) has a score of 79.6% in the linked table, "o3 (high) + gpt-4.1" combo has a the highest score of 82.7%.

The previous Gemini 2.5 Pro Preview 05-06 (yea, not current 06-05!) was at 76.9%.

That looks like a pretty nice bump!

But either way, these Aider benchmarks seem to be most useful/trustworthy benchmarks currently and really the only ones I'm paying attention to.

But so.much.cheaper.and.faster. Pretty amazing.
That's the older 05-06 preview, not the new one from today.
They knew that. The 82.2 comes from the new benchmarks in the OP not from the aider url. The aider url was supplied for comparison.
Ah, thanks for clearing that up!