|
|
|
|
|
by saberience
37 days ago
|
|
This is kinda FUD. Claude really isn’t that good compared to Codex firstly and if you combine the latest DeepSeek model with any good coding harness the results are surprisingly comparable to Claude Code. I would say DeepSeek is definitely behind compared to Codex but Claude doesn’t and hasn’t impressed me for some time now. It writes way too much code when it doesn’t need to in a fashion that gradually rots your codebase. Codex is the only model I’ve used which will regularly remove more code than it adds or make a fix or feature by adding a single line of code or otherwise do minimal working changes. Claude is the model which can get the feature working by adding two new classes, 20 new methods and 2000 lines of code, when it actually needed to remove 500 lines of code and add two new methods. Claude will also often refactor by adding tons of new code and using it while not deleting any of the old code. |
|
But that was 3 months ago, have not tried it since, they could have grown.
To be fair, I think what you are meaning, if I drop the literal frame here, is this, tell me if I am right:
Codex > Claude in my setup.
that right?
To be fair, my tests were not apples to apples. I have sophisticated agent alignment harnesses which prevent claude from hallucinating or going off the rails, ( not literally, not 100% = about 80% less hallucination, about 90% less drift, and about 98% more starting from crystal clear intent.
And in my personal tests, codex was not calibrated to use those systems, it had them but would have needed to find them.
Also I am in a massive project, next ai labs, ixcoach, with likely in the range of 20k files of code, 100x files of docs...
It could just be my agent alignment harness thats making claude outperform codex. Looking into testing it on the major benchmarks and publishing the results.