| not according to Aider leaderboard https://aider.chat/docs/leaderboards/ I use only the APIs directly with Aider (so no experience with AI Studio). My feeling with Claude is that they still perform good with weak prompts, the "taste" is maybe a little better when the direction is kinda unknown by the prompter. When the direction is known I see Gemini 2.5 Pro (with thinking) on top of Claude with code which does not break. And with o4-mini and o3 I see more "smart" thinking (as if there is a little bit of brain inside these models) at the expense of producing unstable code (Gemini produces more stable code). I see problems with Claude when complexity increases and I would put it behind Gemini and o3 in my personal ranking. So far I had no reason to go back to Claude since o3-mini was released. |
I was much more satisfied with o3 and Aider, I haven't tried them on this specific problem but I did quite a bit of work on the same project with them last night. I think I'm being a bit unfair, because what Claude got stuck on seems to be a hard problem, but I don't like how they'll happily consume all my money trying the same things over and over, and never say "yeah I give up".