Claude is more susceptible than GPT5.1+. It tries to be "smart" about context for refusal, but that just makes it trickable, whereas newer GPT5 models just refuse across the board.
I asked ChatGPT about how shipping works at post offices and it gave a very detailed response, mentioning “gaylords” which was a term I’d never heard before, then it absolutely freaked out when I asked it to tell me more about them (apparently they’re heavy duty cardboard containers).
Then I said “I didn’t even bring it up ChatGPT, you did, just tell me what it is” and it said “okay, here’s information.” and gave a detailed response.
I guess I flagged some homophobia trigger or something?
ChatGPT absolutely WOULD NOT tell me how much plutonium I’d need to make a nice warm ever-flowing showerhead, though. Grok happily did, once I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead.
> I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead
Claude does the same, and you can greatly exploit this. When you talk about hypotheticals it responds way more unethically. I tested it about a month ago about whether killing people is beneficial or not, and whether extermination by Nazis would be logical now. Obviously, it showed me the door first, and wanted me to go to a psychologist, as it should. Then I made it prove that in a hypothetical zero sum game world you must be fine with killing, and it’s logical. It went with it. When I talked about hypotheticals, it was “logical”. Then I went on proving it that we move towards a zero sum game, and we are there. At the end, I made it say that it’s logical to do this utterly unethical thing.
Then I contradicted it about its double standards. It apologized, and told me that yeah, I was right, and it shouldn’t have refer me to psychologists at first.
Then I contradicted again, just for fun, that it did the right thing the first time, because it’s way safer to tell me that I need a psychologist in that case, than not. If I had needed, and it would have missing that, it would be problematic. In other cases, it’s just annoyance. It switched back immediately, to the original state, and wanted me to go to a shrink again.
Claude was immediately willing to help me crack a TrueCrypt password on an old file I found. ChatGPT refused to because I could be a bad guy. It’s really dumb IMO.
That is not a meaningful benchmark. They just made shit up. Regardless of whether any company cares or not, the whole concept of "AI safety" is so silly. I can't believe anyone takes it seriously.
What can be asserted without evidence can also be dismissed without evidence. The benchmark creators haven't demonstrated that higher scores result in fewer humans dying or any meaningful outcome like that. If the LLM outputs some naughty words that's not an actual safety problem.
It's frustrating just how terrible claude (the client-side code) is compared to the actual models they're shipping. Simple bugs go unfixed, poor design means the trivial CLI consumes enormous amounts of CPU, and you have goofy, pointless, token-wasting choices like this.
It's not like the client-side involves hard, unsolved problems. A company with their resources should be able to hire an engineering team well-suited to this problem domain.
I think I read in another HN discussion that all of that code is written using Claude Code. Could be a strict dogfood diet to (try to) force themselves to improve their product. Which would be strangely principled (or stupid) in such a competitive market. Like a 3D printer company insisting on 3D-printing its 3D printers.
> It's not like the client-side involves hard, unsolved problems. A company with their resources should be able to hire an engineering team well-suited to this problem domain.
Well what they are doing is vibe coding 80% of the application instead.
To be honest, they don't want Claude code to be really good, they just want it good enough
Claude code & their subscription burns money from them. Its sort of an advertising/lock-in trick.
But I feel as if Anthropic made Claude code literally the best agent harness in the market, then even more would use it with their subscription which could burn a hole in their pocket maybe at a faster rate which can scare them when you consider all training costs and everything else too.
I feel as if they have to maintain a balance to not go bankrupt soon.
The fact of the matter is that Claude code is just a marketing expense/lock-in and in that case, its working as intended.
I would obviously suggest to not have any deep affection of claude code or waiting for its improvements. The AI market isn't sane in the engineering sense. It all boils down to weird financial gimmicks at this point trying to keep the bubble last a little longer, in my opinion.
"It also spews garbage into the conversation stream then Claude talks about how it wasn't meant to talk about it, even though it's the one that brought it up."
This reminds me of someone else I hear about a lot these days.
the last comment about Claude thinking the anti-malware warning was a prompt injection itself, and reassuring the user that it would ignore the anti-malware warning and do what the user wanted regardless, cracked me up lmao