| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jcuenod 376 days ago
	82.2 on Aider Still actually falling behind the official scores for o3 high. https://aider.chat/docs/leaderboards/

3 comments

sottol 376 days ago

Does 82.2 correspond to the "Percent correct" of the other models?

Not sure if OpenAI has updated O3, but it looks like "pure" o3 (high) has a score of 79.6% in the linked table, "o3 (high) + gpt-4.1" combo has a the highest score of 82.7%.

The previous Gemini 2.5 Pro Preview 05-06 (yea, not current 06-05!) was at 76.9%.

That looks like a pretty nice bump!

But either way, these Aider benchmarks seem to be most useful/trustworthy benchmarks currently and really the only ones I'm paying attention to.

link

vessenes 376 days ago

But so.much.cheaper.and.faster. Pretty amazing.

link

hobofan 376 days ago

That's the older 05-06 preview, not the new one from today.

link

energy123 376 days ago

They knew that. The 82.2 comes from the new benchmarks in the OP not from the aider url. The aider url was supplied for comparison.

link

hobofan 376 days ago

Ah, thanks for clearing that up!

link