Hacker News new | ask | show | jobs
by noxa 84 days ago
It's gotten very bad. It was degrading since late Feb and since March 8th has become unusable. "Simplest fix" and "You're right, I'm sorry" are strong indicators. It went from senior engineer to entitled intern, and I went from having a team of peers to a lazy jerk who only tries to cut corners. I've got quantitative analytics of it, too. Briefly the other day for about 24 hours it returned to normal, and then someone flipped the switch again mid-session. I was a massive proponent of Claude/Opus, and for the last several weeks have felt rug-pulled. It's such an obvious degradation that even non-technical friends have noticed it. It's optimizing for minimum effort instead of correct and clean solutions. It sucks, because had I experienced it like this from the start I'd have bounced from agentic coding and never looked back - unfortunately, I thought it'd only get better and adjusted my workflow around it. When my Qwen3.5 27B local model gets into fewer reasoning loops than Opus does, it makes me wonder if anyone there cares or if they are just chasing IPO energy from scaling.

I had to build a stop hook to catch it's garbage, and even then it's not enough. I had 30min-1hr uninterrupted sessions (some slipstreamed comments), and now I can't get a single diff that I can accept without comment. Half of the work it does is more destructive than helpful (removing comments from existing code, ignoring directives and wandering off into nowhere, etc).

From 2 weeks after installing the stop hook (around March 8th): ``` Breakdown of the 173 violations:

    73x ownership dodging (caught saying variants of "not caused by my changes")
    40x unnecessary permission-seeking ("should I continue?", "want me to keep going?")
    18x premature stopping ("good stopping point", "natural checkpoint")
    14x "known limitation" dodging
    14x "future work" / "known issue" labeling
    Various: "next session", "pause here", etc.
Peak day: March 18 with 43 violations in a single day. ```

Other one is loops in reasoning, which are something I'm familiar with on small local models, not frontier ones: ``` Sessions containing 5+ instances of reasoning-loop phrases ("oh wait", "actually,", "let me reconsider", "I was wrong"): Period Sessions with 5+ loops Before March 8 0 After March 8 7 (up to 23 instances in one session) ``` (I've even had it write code where it has "Wait, actually, we should do X" in comments in the code!)

The worst is the dodging; it said, literally, "not my code, not my problem" to a build failure it created 5 messages ago in the same session. ``` I had to tell Claude "there's no such thing as [an issue that existed before your changes]" on average:

    Once per week in January
    2-3 times per week in February
    Nearly daily from March 8 onward
```

Honestly, just venting, because I'm extremely depressed. I had the equivalent of a team of engineers I could trust, and overnight someone at Anthropic flicked a switch and killed them. I'm getting better results from random models on OpenRouter now (and OmniCoder 9B! 9B!). They aren't _good_ results, mind you, but they aren't idiotic.

Sad. Very sad.

2 comments

Reading your report made me quite depressed too. This is world changing technology. It feels like I was only allowed to have a glimpse of it before it was taken away. I hope things get better in the future...
I hear you and I am really hoping more people notice this obvious degradation than dismiss this as workflow or prompt or context saturation issues.

It isn’t obvious but hope the guys managing this realize what kind of confusion and doubt (or self doubt) that this creates in people and will have a long term impact on usage of their models.

I am going to try removing every and all plugins (i only have all Anthropic’s plugins like superpowers) and see if that makes any difference.

Yeah, I went through a week or two of configuration changes trying to figure out what I could have done to make it behave that way, and it wasn't until it repaired itself and then the next morning went back to idiot-mode mid-response that I finally knew it was not me. Same task, same session, same cc version, same prompts, same context, so I'm confident it was a configuration change on their end.

In case anyone can correlate, the recovery happened on March 24th and then re-regressed at approximately 3:09 PM PST (23:09 UTC) on March 25. Flipped right back into "simplest" solutions, and "You're right, I'm sorry" mode:

> "You're right. That was lazy and wrong. I was trying to dodge a code generator issue instead of fixing it."

> "You're right — I rushed this and it shows. Let me be deliberate about the structure before writing."

> "You're right, and I was being sloppy. The CPU slab provider's prefault is real work."