| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mohsen1 42 days ago
	I don't know about Mythos but in recent weeks I've noticed Opus is constantly failing to fix things in tsz[0] vs GPT 5.5 can easily churn out fixes that are solid and pass tests. I've stopped paying for Claude for now and all my money is going to OpenAI at the moment. Either Opus is massively nerfed or GPT 5.5 is really head and shoulder higher in terms of very difficult tasks. The last percent of conformance tests in tsz are really really difficult and I've seen Opus bailing again and again. So annoying to waste time and tokens to finally get "this is too involved" or "this requires a multi-week sprint to fix". [0] https://tsz.dev

2 comments

_pdp_ 42 days ago

The new Opus feels like a step backwards. More expensive, thinks more, and it does not get the job done.

link

vincent_s 42 days ago

From a user’s perspective 4.7 is a downgrade compared to 4.6 . It’s intended to give Anthropic more control about their compute resources and profitability:

https://news.ycombinator.com/item?id=48072916

link

dyauspitr 42 days ago

Having never used Claude and only Codex, does Claude actually say “this is too involved” as a response to a prompt?

link

mohsen1 42 days ago

Yes it does. Usually after hours of working and not getting results

link

redditor98654 42 days ago

I am curious, what kind of work do you use Claude for that sometimes requires hours of working. In my case, I have never seen it go off for more than 10 mins and even that is very rare.

link

big_youth 42 days ago

debugging code. I had some issue so I create a plan to root cause that would run the code, change some functions or variables and run again until we get a confirmed answer.

I just work up to that very workflow this morning. I ran last night and finished at around 3am with ~200k tokens spent. Fixed the issue and created a follow up doc for things that it could not verify.

link

mohsen1 42 days ago

https://github.com/mohsen1/tsz

link