|
|
|
|
|
by godelski
170 days ago
|
|
What's concerning to many of us is that you've (and others) have said this same thing s/Opus 4.5/some other model/ That feels more like chasing than a clear line of improvement. It's interrupted very different from something like "my habits have changed quite a bit since reading The Art of Computer Programming". They're categorically different. |
|
Some of these improvements have been minor, some of them have been big enough to feel like step changes. Sonnet 3.7 + Claude Code (they came out at the same time) was a big step change; Opus 4.5 similarly feels like a big step change.
(If you don't trust vibes, METR's task completion benchmark shows huge improvements, too.)
If you're sincerely trying these models out with the intention of seeing if you can make them work for you, and doing all the things you should do in those cases, then even if you're getting negative results somehow, you need to keep trying, because there will come a point where the negative turns positive for you.
If you're someone who's been using them productively for a while now, you need to keep changing how you use them, because what used to work is no longer optimal.