Hacker News new | ask | show | jobs
by kolinko 19 days ago
On the other hand, I found Claude/Opus to be extremely unhelpful when it comes to asking it to benchmark itself with a possible replacement.

It will get "confused", make up numbers, do a ton of other things, and I'm quite sure it is subtly sabotaging the process to show that there is no point replacing it.

I mean, Opus is not perfect, but the amount of "mistakes" it begins to do when you ask it to benchmark itself makes me suspect they are intentional. At least my system/harness.

2 comments

No, they are always like that.

It's really easy (and tempting) to incorrectly impute all sorts of human motives to these things, but it's no more valid than assuming your Magic 8-Ball is being coy.

You didn't add "never hallucinate or make anything up" to the prompt, rookie mistake.