| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ahmadyan 61 days ago

pretty spot on.

In my experience, Opus 4.0 was fantastic, major jump from 3.7. it was creative, super slow and expensive, and would sometime forget what it was doing, but it was getting the job done.

4.1 they made it much faster, so a lot of infra improvements.

4.5 was the time it could work on longer task, didn't make a lot of obvious mistakes of 4.0, and i think this was about the time the opus went mainstream, and all of the anthropic's compute crisis began, so instead of making the model better they tried to optimize it to reduce cost instead.

4.6 was such a bad model, they switched to adaptive thinking and it had so many bugs. poor api design, benchmaxxed and poor real-world results. i switched back to 4.5.

4.7 they just fixed the bugs they added in 4.6. Better than 4.5.

haven't fully tested 4.8 yet.

2 comments

sumedh 61 days ago

> "4.6 was such a bad model,"

It's just amusing reading all these posts with different viewpoints, just in this thread there are multiple people saying 4.6 was so much better than 4.7 and that they switched back to 4.6.

link

Otterly99 61 days ago

I also find it amusing. I also heard a lot of "4.7 is garbage, everybody hates it". Shows you how important proper validation techniques are, not just gut feeling.

link

ahmadyan 60 days ago

that is a fair point, everything i said above was in my experience.

* in our experience, in our evals and codebase, 4.6 was a bad model. This is over 60k developers, so statistically significant.

link

teruakohatu 61 days ago

I gave 4.6 a miss and only recently switched from 4.5 to 4.7. I found on a particularly different task 4.5 struggled with (getting stuck in loops and trying to convince me the problem had been solved) was quite solvable with 4.7.

link