Hacker News new | ask | show | jobs
by the_duke 248 days ago
In my experience gpt5-codex (medium) and codex-cli is notably better than Sonnet 4.5 and claude-code. (note: never tried Opus)

It is slower, but the results are much more often correct and it doesn't rush into half-baked solutions/dumb approaches as eagerly.

I'd much rather wait 5 minutes than have to clean up manually or try to coax a model into doing things differently.

I also wouldn't be surprised if the slowness was partially due to OpenAI being quite resource constrained. They are repeatedly complaining about not having sufficient compute.

Bigger picture: I think all the AI coding environments are incredibly immature. There are many improvements to be unlocked.

2 comments

Where codex falls short is in background processing, both running a daemon in the background and using its output as context while simultaneously being interactive for the user, and with subagents, ie, do multiple things in parallel. Presumably codex will catch up, but for now, that puts Claude Code ahead of things for me.

As far as which one is better, it's highly dependent on what we're each doing, but I will say that I have this one project where bare "make" won't work, and I have a script that needs to be run instead. I have instructions to call that script in multiple .md files, and codex is able to call the script instead of make, but it keeps forgetting that and tries to run make which fails and it gets confused. (Claude code running on macOS host but build on Linux vm.) I could work around it, but that really takes the "shiny" factor off of codex+GPT-5 for me.

Honestly I think the simplicity of codex to not do anything fancy pants like background coding is what gives it an edge. I am happy to wait for a while and even to repeat context to it (helps me remember stuff anyway) if it types out the right thing.
That’s falsifiable quite easily by measuring tokens per second.

Rather, the real reason codex takes longer is that it does more work to read more context.

IMO the results are much better with codex, not even close