| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by therealmarv 366 days ago

not according to Aider leaderboard https://aider.chat/docs/leaderboards/

I use only the APIs directly with Aider (so no experience with AI Studio).

My feeling with Claude is that they still perform good with weak prompts, the "taste" is maybe a little better when the direction is kinda unknown by the prompter.

When the direction is known I see Gemini 2.5 Pro (with thinking) on top of Claude with code which does not break. And with o4-mini and o3 I see more "smart" thinking (as if there is a little bit of brain inside these models) at the expense of producing unstable code (Gemini produces more stable code).

I see problems with Claude when complexity increases and I would put it behind Gemini and o3 in my personal ranking.

So far I had no reason to go back to Claude since o3-mini was released.

3 comments

stavros 366 days ago

I just spent $35 for Opus to solve a problem with a hardware side-project (I'm turning an old rotary phone into a meeting handset so I can quit meetings by hanging up, if you must know). It didn't solve the problem, it churned and churned and spent a ton of money.

I was much more satisfied with o3 and Aider, I haven't tried them on this specific problem but I did quite a bit of work on the same project with them last night. I think I'm being a bit unfair, because what Claude got stuck on seems to be a hard problem, but I don't like how they'll happily consume all my money trying the same things over and over, and never say "yeah I give up".

antgiant 366 days ago

For basically that same price you could get one of these :-)

https://www.amazon.com/Cell2jack-Cellphone-Adapter-Receive-l...

stavros 366 days ago

Where's the fun in that?!

antgiant 366 days ago

Enjoy yourself! Don’t let me spoil your fun :-)

stavros 366 days ago

Oh I'm not! I'll post it here when I'm done, it's already hilarious.

sans_souse 366 days ago

wait, you're using a rotary phone ?

stavros 366 days ago

I want to!

alecco 366 days ago

Give them feedback.

stavros 366 days ago

Feedback on what?

CamperBob2 366 days ago

When I obtain results from one paid model that are significantly better than what I previously got from another paid model, I'll typically give a thumbs-down to the latter and point out in the comment that it was beaten by a competitor. Can't hurt.

stavros 366 days ago

Ah, this wasn't from the web interface, I was using Claude Code. I don't think it has a feedback mechanism.

macNchz 366 days ago

Using all of the popular coding models pretty extensively over the past year, I've been having great success with Gemini 2.5 Pro as far as getting working code the first time, instruction following around architectural decisions, and staying on-task. I use Aider and write mostly Python, JS, and shell scripts. I've spent hundreds of dollars on the Claude API over time but have switched almost entirely to Gemini. The API itself is also much more reliable.

My only complaint about 2.5 Pro is around the inane comments it leaves in the code (// Deleted varName here).

ZeWaka 366 days ago

If you use one of the AI static instructions methods (e.g., .github/copilot-instructions.md) and tell it to not leave the useless comments, that seems to solve the issue.

macNchz 366 days ago

I've been intending to try some side by side tests with and without a conventions file instructing it not to leave stupid comments—I'm curious to see if somehow they're providing value to the model, e.g. in multi-turn edits.

luckydata 366 days ago

it's easier to just make it do a code review with focus on removing unhelpful comments instead of asking it not to do it the first time. I do the cleanup after major rounds of work and that strategy seems to work best for me.

jjani 366 days ago

This was not my experience with the earlier preview (03), where its insistence on comment spam was too strong to overcome. Wonder if this adherence improved in the 05 or 06 updates.

sans_souse 366 days ago

can you elaborate on this?

dominicrose 365 days ago

I don't mind the comments, I read them while removing them. It's normal to have to adapt the output, change some variable names, refactor a bit. What's impressive is that the output code actually works (or almost). I didn't give it the hardest of problems to solve/code but certainly not easy ones.

macNchz 365 days ago

Yeah I've mostly just embraced having to remove them as part of a code review, helps focus the review process a bit, really.

avereveard 366 days ago

I'm using pro for backend and claude for ux work, claude is so much thoughtful about how user interact with software and can usually replicate better the mock up that gpt4o image generator produces, while not being overly fixated on the mockup design itself.

My complaint is that it catches python exceptions and don't log them by default.

miki123211 365 days ago

And the error handling. God, does it love to insert random try/except statements everywhere.

hirako2000 366 days ago

You feelings of a little brain in there, and stable code are unfounded. All these models collapse pretty fast. If not due to context limit, then in their inability to interpret problems.

An LLM is just statistical regressions with a llztjora of engineering tricks, mostly NLP to produce an illusion.

I don't mean it's useless. I mean comparing these ever evolving models is like comparing escort staff in NYC vs those in L.A, hard to reach any conclusjon. We are getting fooled.

On the price increase, it seems Google was aggressively looking for adoption, Gemini was for a short range of time the best value for money of all the LLMs out there. Adoption likely surged, scaling needs be astronomical, costing Google billions to keep up. The price adjustment could've been expected before they announced it.