|
|
|
|
|
by kolinko
19 days ago
|
|
On the other hand, I found Claude/Opus to be extremely unhelpful when it comes to asking it to benchmark itself with a possible replacement. It will get "confused", make up numbers, do a ton of other things, and I'm quite sure it is subtly sabotaging the process to show that there is no point replacing it. I mean, Opus is not perfect, but the amount of "mistakes" it begins to do when you ask it to benchmark itself makes me suspect they are intentional. At least my system/harness. |
|
It's really easy (and tempting) to incorrectly impute all sorts of human motives to these things, but it's no more valid than assuming your Magic 8-Ball is being coy.