| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bratao 36 days ago
	It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.

3 comments

vessenes 36 days ago

Some of it’s probably timing. Some of it is wanting to look good. That said, I just went to the claw-eval site, and neither 4.7 nor 5.5 from oAI are listed on the benchmarks. So there’s also just the time from others to get benchmarking done and published.

link

varispeed 36 days ago

Opus-4.6 was probably the best model so far before it got nerfed. 4.7 is nowhere near experience I had. In fact I stopped using it completely because more often than not its output is just dumber than local models.

link

leonidasv 36 days ago

Same here. Can't stand 4.7.

link

solenoid0937 36 days ago

Opus 4.6 was never nerfed, that's FUD. There were harness-level problems that were fixed.

4.7 is much better. But perception is a funny thing, once you think something is bad you start looking for it everywhere.

link

anonyfox 35 days ago

Still anecdotal but the exact same coding task on the exact same repo (I clone from GitHub templates for projects) worked amazingly well in December with CC/Opus, couldn’t accomplish the goal anymore end of march, with essentially identical prompts, and 4.7 was just comically useless. But even these days I tried repeatedly and 4.6 still can’t do the thing it could in December.

link

kroaton 35 days ago

Did you even use it? It was nerfed to hell and back. It stopped following instructions, forgot what sub-agents responded and so on. Stop spreading this pro-Anthropic narrative. They did a rug pull due to lack of compute.

link

arkadiytehgraet 32 days ago

You are replying to an Anthropic shill, check their comment history. They likely never used AI in development, only LLMs for their comments on HN.

link

dyauspitr 36 days ago

Because these can’t compete with the SoTA but they’re close.

link