Hacker News new | ask | show | jobs
by bushido 66 days ago
I think my results have actually become worse with Opus 4.7.

I have a pretty robust setup in place to ensure that Claude, with its degradations, ensures good quality. And even the lobotomized 4.6 from the last few days was doing better than 4.7 is doing right now at xhigh.

It's over-engineering. It is producing more code than it needs to. It is trying to be more defensible, but its definition of defensible seems to be shaky because it's landing up creating more edge cases. I think they just found a way to make it more expensive because I'm just gonna have to burn more tokens to keep it in check.

1 comments

Maybe this? From the article:

> Opus 4.7 is substantially better at following instructions. Interestingly, this means that prompts written for earlier models can sometimes now produce unexpected results: where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally. Users should re-tune their prompts and harnesses accordingly.

Possible, but very unlikely.

One of the hard rules in my harness is that it has to provide a summary Before performing a specific action. There is zero ambiguity in that rule. It is terse, and it is specific.

In the last 4 sessions (of 4 total), it has tried skipping that step, and every time it was pointed out, it gave something like the following.

> You're right — I skipped the summary. Here it is.

It is not following instructions literally. I wish it was. It is objectively worse.

Using hooks can help.
Not sure it is better at following instructions. One of the first issues I had with it was doing the thing it was specifically forbidden from doing. When told: "oh sorry, I had a note that I should not do it in my MEMORY but I did it anyway".