Hacker News new | ask | show | jobs
by mohsen1 42 days ago
I don't know about Mythos but in recent weeks I've noticed Opus is constantly failing to fix things in tsz[0] vs GPT 5.5 can easily churn out fixes that are solid and pass tests. I've stopped paying for Claude for now and all my money is going to OpenAI at the moment. Either Opus is massively nerfed or GPT 5.5 is really head and shoulder higher in terms of very difficult tasks. The last percent of conformance tests in tsz are really really difficult and I've seen Opus bailing again and again. So annoying to waste time and tokens to finally get "this is too involved" or "this requires a multi-week sprint to fix".

[0] https://tsz.dev

2 comments

The new Opus feels like a step backwards. More expensive, thinks more, and it does not get the job done.
From a user’s perspective 4.7 is a downgrade compared to 4.6 . It’s intended to give Anthropic more control about their compute resources and profitability:

https://news.ycombinator.com/item?id=48072916

Having never used Claude and only Codex, does Claude actually say “this is too involved” as a response to a prompt?
Yes it does. Usually after hours of working and not getting results
I am curious, what kind of work do you use Claude for that sometimes requires hours of working. In my case, I have never seen it go off for more than 10 mins and even that is very rare.
debugging code. I had some issue so I create a plan to root cause that would run the code, change some functions or variables and run again until we get a confirmed answer.

I just work up to that very workflow this morning. I ran last night and finished at around 3am with ~200k tokens spent. Fixed the issue and created a follow up doc for things that it could not verify.