| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ivanech 114 days ago
	Hmm in my experience (I've done a lot of head-to-heads), Opus 4.6 is a weaker reviewer than GPT 5.4 xhigh. 5.4 xhigh gives very deep, very high-signal reviews and catches serious bugs much more reliably. I think it's possible you're observing Opus 4.6's higher baseline acceptance rate instead of GPT 5.4's higher implementation quality bar.

4 comments

parasti 114 days ago

This is also my experience using both via Augment Code. Never understood what my colleagues see in Claude Opus, GPT plans/deep dives are miles ahead of what Opus produces - code comprehension, code architecture is unmatched really. I do use Sonnet for implementation/iteration speed after seeding context with GPT.

link

egeozcan 114 days ago

I agree. Opus, forget the plan mode - even when using superpowers skill, leaves a lot of stuff dangling after so many review rounds.

Along with claude max, I have a chatgpt pro plan and I find it a life-saver to catch all the silliness opus spits out.

link

jonnycoder 114 days ago

I agree, I use codex 5.4 xhigh as my reviewer and it catches major issues with Opus 4.6 implementation plans. I'm pretty close to switching to codex because of how inconsistent claude code has become.

link

petcat 114 days ago

Maybe it's all just anecdotal then. Everyone is having different experiences.

Maybe we're being A/B tested.

link

femiagbabiaka 114 days ago

The experience one has with this stuff is heavily influenced by overall load and uptime of Anthopic's inference infra itself. The publicly reported availability of the service is one 9, that says nothing of QoS SLO numbers, which I would guess are lower. It is impossible to have a consistent CX under these conditions.

link