| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by al_borland 65 days ago
	This seems like the experience I've had with every model I've tried over the last several years. It seems like an inherent limitation of the technology, despite the hyperbolic claims of those financially invested in all of this paying off.

1 comments

smt88 65 days ago

Opus 4.6 pre-nerf was incredible, almost magical. It changed my understanding of how good models could be. But that's the only model that ever made me feel that way.

link

whalesalad 65 days ago

Yes! I genuinely got a LOT of shit done with Opus 4.6 "pre nerf" with regular old out-of-the-box config, no crazy skills or hacks or memory tweaks or anything. The downfall is palpable. Textbook rugpull.

link

solenoid0937 64 days ago

There was no nerf - this meme needs to die.

link

smt88 64 days ago

What exactly happened then? How did we all have this collective hallucination?

link

solenoid0937 64 days ago

Collective hallucinations are common. Mandela effect, people thinking FB is listening to your microphone because they see relevant ads, etc

This is a common phenomenon that all humans pattern match to things we expect. When we learn a new vocabulary word you see it everywhere for the next two days. When we think Claude might be nerfed, we overindex on every instance of Claude underperforming.

The only way to account for this is credulous, hard data. Like benchmarks over time. To this day no one has provided evidence that Claude Code, when fixed to the same thinking level, has had degraded performance.

link

al_borland 64 days ago

Are there any good ways to benchmark models over time that don't fall victim to Goodhart's law? It seems that once the benchmark is defined, the AI will train on it, and it will become effectively meaningless.

I read many articles about AIs doing extremely well on various tests in graduate or PhD level programs. But these tests are well defined. A professor put the same models though his freshman CS class and most of them failed.

link

ec109685 64 days ago

Did they nerf the model or was it changes to Claude code? I agree it got frustrating.

link

al_borland 65 days ago

That was better, but still not to the point that I just let it go on my repo.

link