| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shivang2607 39 days ago
	In my personal experience, no model comes close to claude when it comes to coding performance. It does not matter what any of the benchmarks says. Having said that I really hope this model of deepseek, performs significantly on par with the claude saunnet model.

3 comments

Our_Benefactors 39 days ago

Codex is good now. I’m undecided which is better, but they’re definitely close enough that I feel comfortable recommending Claude-exclusive people in my circle to try codex.

link

ticulatedspline 39 days ago

Is there a good 4th option yet? I haven't really been impressed by what I've tried so far.

I believe Claude Code only works with claude and seems all I hear about that is it's great but the token limits are so anemic as to make it useless unless you want to shell out $200+ a month, which I do not so I haven't bothered.

I tried codex but it wouldn't run out-of-the-box. Installation on a fresh Windows box resulted in some obscure error which is a strong "this product isn't fully baked" signal.

Open code desktop thus far has been the only turn-key solution, worked right away on it's pickle model but was a real pain to hook to anything else. It exhibits a lot of the typical obtuse UX that open source projects end up with since open source tends to attract coders-developers more than UX/UI people. At least it does mention that it's still beta.

link

DeathArrow 38 days ago

>I believe Claude Code only works with claude and seems all I hear about that is it's great but the token limits are so anemic as to make it useless unless you want to shell out $200+ a month, which I do not so I haven't bothered.

I use Claude Code with GLM 5.q, Kimi K2.6, MiniMax M2.7 and Xiaomi MiMo V2.5 Pro.

link

Larrikin 38 days ago

Why use Claude Code over something like Opencode then? From my limited usage of the tools over the past couple of months Claude Codes ergonomics feel strictly worse than Opencode, but I haven't deeply investigated either yet. I am using Claude models in both so I am getting a one to one comparison.

link

DeathArrow 38 days ago

Because I tried OpenCode and Claude Code seems a better harness. It has the best plugins and many skills are designed to work best with Claude Code.

link

ticulatedspline 38 days ago

ah , wasn't aware it was open for use with non Claude models I'll take a closer look at it then. thanks.

link

wett 39 days ago

I liked Zed’s agent as a harness. It’s LLM agnostic. My org just got GitHub Copilot and I use it as the API provider for requests.

link

shivang2607 36 days ago

Maybe, in your experience, but I have tried both claude and codex and the code quality that claude produces is still superior in my experience, maybe its because of the nature of app I am building

link

saberience 39 days ago

This is kinda FUD. Claude really isn’t that good compared to Codex firstly and if you combine the latest DeepSeek model with any good coding harness the results are surprisingly comparable to Claude Code.

I would say DeepSeek is definitely behind compared to Codex but Claude doesn’t and hasn’t impressed me for some time now. It writes way too much code when it doesn’t need to in a fashion that gradually rots your codebase.

Codex is the only model I’ve used which will regularly remove more code than it adds or make a fix or feature by adding a single line of code or otherwise do minimal working changes.

Claude is the model which can get the feature working by adding two new classes, 20 new methods and 2000 lines of code, when it actually needed to remove 500 lines of code and add two new methods.

Claude will also often refactor by adding tons of new code and using it while not deleting any of the old code.

link

IXCoach 39 days ago

Fascinating, Claude outperforms codex in my coding setup by 5X to 10X, its not even close. Interesting that your claiming the opposite is a general fact here, if i take you literally at your word. Codex has been so bad, in fact, that while I was maintaining a $200/m sub I literally did not even "use it up" while paying for it, ( after cancelling ).

But that was 3 months ago, have not tried it since, they could have grown.

To be fair, I think what you are meaning, if I drop the literal frame here, is this, tell me if I am right:

Codex > Claude in my setup.

that right?

To be fair, my tests were not apples to apples. I have sophisticated agent alignment harnesses which prevent claude from hallucinating or going off the rails, ( not literally, not 100% = about 80% less hallucination, about 90% less drift, and about 98% more starting from crystal clear intent.

And in my personal tests, codex was not calibrated to use those systems, it had them but would have needed to find them.

Also I am in a massive project, next ai labs, ixcoach, with likely in the range of 20k files of code, 100x files of docs...

It could just be my agent alignment harness thats making claude outperform codex. Looking into testing it on the major benchmarks and publishing the results.

link

ay 39 days ago

Both you and parent could be right.

There is a fun term “jagged frontier”.

Meaning: one model can be much better than the other one in one thing, and much worse than the other in another thing.

link

nsonha 37 days ago

Codex 3 months ago was really different (I think they released 2 versions since?). It was also the same time that I started using Codex and I see no different in quality comparing to Claude. Still using Claude mainly because work mandates it, but doing all my personal stuff with Codex now.

link

rapind 37 days ago

Claude at times feels lobotomized compared to where it was a few months ago with 4.6. I think Anthropic was (is still?) struggling with their infrastructure and hasn't felt as good for me anecdotally for a few months. Significantly enough that I've cancelled (max 20x).

Codex 5.5 extra high currently feels a good amount smarter than either 4.6 or 4.7 Opus. I only just started using it about a week ago, so maybe that's a recent development and then OpenAI will eventually lobotomize their model or throttle etc.

What I dislike about frontier models is how opaque and incentivized the businesses are about tweaking their services. Anthropic definitely does some shady throttling. I have zero trust for Altman and he's BS AGI claims. And Google makes it non-obvious that you can't turn Gemini training off on even their highest tier personal plan. There's a lot of shady and dishonest behaviour, probably because they are all overhyped and heavily subsidized to win the race. I don't mind at all paying more than I currently am for these services, but I don't trust any of these frontier model companies, and so I'm cheering for open models.

Right now I'm using Codex for planning and DeepSeek V4 Flash [1m] for implementation. It's quite fast. Quite possible / likely that OpenAI will make significant changes that kill this workflow for w/e reason... at which point I will probably move to full open weight models.

link

shlewis 38 days ago

It's also 28 times more expensive than V4 pro and 111 times than V4 Flash.

link