|
|
|
|
|
by sottol
375 days ago
|
|
Does 82.2 correspond to the "Percent correct" of the other models? Not sure if OpenAI has updated O3, but it looks like "pure" o3 (high) has a score of 79.6% in the linked table, "o3 (high) + gpt-4.1" combo has a the highest score of 82.7%. The previous Gemini 2.5 Pro Preview 05-06 (yea, not current 06-05!) was at 76.9%. That looks like a pretty nice bump! But either way, these Aider benchmarks seem to be most useful/trustworthy benchmarks currently and really the only ones I'm paying attention to. |
|