|
|
|
|
|
by insomagent
139 days ago
|
|
I'm not super impressed with the performance, actually. I'm finding that it misunderstands me quite a bit. While it is definitely better at reading big codebases and finding a needle in a haystack, it's nowhere near as good as Opus 4.5 at reading between the lines and figuring out what I really want it to do, even with a pretty well defined issue. It also has a habit of "running wild". If I say "first, verify you understand everything and then we will implement it." Well, it DOES output its understanding of the issue. And it's pretty spot-on on the analysis of the issue. But, importantly, it did not correctly intuit my actual request: "First, explain your understanding of this issue to me so I can validate your logic. Then STOP, so I can read it and give you the go ahead to implement." I think the main issue we are going to see with Opus 4.6 is this "running wild" phenomenon, which is step 1 of the eternal paperclip optimizer machine. So be careful, especially when using "auto accept edits" |
|
As an example, I asked it to commit everything in the worktree. I stressed everything and prompted it very explicitly, because even 4.5 sometimes likes to say, "I didn't do that other stuff, I'm only going to commit my stuff even though he said everything".
It still only committed a few things.
I had to ask again.
And again.
I had to ask four times, with increasing amounts of expletives and threats in order to finally see a clean worktree. I was worried at some point it was just going to solve the problem by cleaning the workspace without even committing.
4.5 is way easier to steer, despite its warts.