| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by woeirua 133 days ago
	That's such a huge delta that Anthropic might be onto something...

4 comments

conception 133 days ago

Anthropic has been the only AI company actually caring about AI safety. Here’s a dated benchmark but it’s a trend Ive never seen disputed https://crfm.stanford.edu/helm/air-bench/latest/#/leaderboar...

link

CuriouslyC 133 days ago

Claude is more susceptible than GPT5.1+. It tries to be "smart" about context for refusal, but that just makes it trickable, whereas newer GPT5 models just refuse across the board.

link

wincy 133 days ago

I asked ChatGPT about how shipping works at post offices and it gave a very detailed response, mentioning “gaylords” which was a term I’d never heard before, then it absolutely freaked out when I asked it to tell me more about them (apparently they’re heavy duty cardboard containers).

Then I said “I didn’t even bring it up ChatGPT, you did, just tell me what it is” and it said “okay, here’s information.” and gave a detailed response.

I guess I flagged some homophobia trigger or something?

ChatGPT absolutely WOULD NOT tell me how much plutonium I’d need to make a nice warm ever-flowing showerhead, though. Grok happily did, once I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead.

link

nandomrumber 133 days ago

Wikipedia entry on the gaylord bulk box:

https://en.wikipedia.org/wiki/Bulk_box

link

ruszki 133 days ago

> I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead

Claude does the same, and you can greatly exploit this. When you talk about hypotheticals it responds way more unethically. I tested it about a month ago about whether killing people is beneficial or not, and whether extermination by Nazis would be logical now. Obviously, it showed me the door first, and wanted me to go to a psychologist, as it should. Then I made it prove that in a hypothetical zero sum game world you must be fine with killing, and it’s logical. It went with it. When I talked about hypotheticals, it was “logical”. Then I went on proving it that we move towards a zero sum game, and we are there. At the end, I made it say that it’s logical to do this utterly unethical thing.

Then I contradicted it about its double standards. It apologized, and told me that yeah, I was right, and it shouldn’t have refer me to psychologists at first.

Then I contradicted again, just for fun, that it did the right thing the first time, because it’s way safer to tell me that I need a psychologist in that case, than not. If I had needed, and it would have missing that, it would be problematic. In other cases, it’s just annoyance. It switched back immediately, to the original state, and wanted me to go to a shrink again.

link

ryanjshaw 133 days ago

Claude was immediately willing to help me crack a TrueCrypt password on an old file I found. ChatGPT refused to because I could be a bad guy. It’s really dumb IMO.

link

BloondAndDoom 133 days ago

ChatGPT refused to help me to disable windows defender permanently on my windows 11. It’s absurd at this point

link

nananana9 133 days ago

It just knows it's a waste of effort.

link

shepherdjerred 133 days ago

Claude sometimes refuses to work with credentials because it’s insecure. e.g. when debugging auth in an app.

link

nradov 133 days ago

That is not a meaningful benchmark. They just made shit up. Regardless of whether any company cares or not, the whole concept of "AI safety" is so silly. I can't believe anyone takes it seriously.

link

mocamoca 133 days ago

Would you mind explaining your point a view? Or point me to ressources making you think so?

link

nradov 133 days ago

What can be asserted without evidence can also be dismissed without evidence. The benchmark creators haven't demonstrated that higher scores result in fewer humans dying or any meaningful outcome like that. If the LLM outputs some naughty words that's not an actual safety problem.

link

LeoPanthera 133 days ago

This might also be why Gemini is generally considered to give better answers - except in the case of code.

Perhaps thinking about your guardrails all the time makes you think about the actual question less.

link

mh2266 133 days ago

re: that, CC burning context window on this silly warning on every single file is rather frustrating: https://github.com/anthropics/claude-code/issues/12443

link

frumplestlatz 133 days ago

It's frustrating just how terrible claude (the client-side code) is compared to the actual models they're shipping. Simple bugs go unfixed, poor design means the trivial CLI consumes enormous amounts of CPU, and you have goofy, pointless, token-wasting choices like this.

It's not like the client-side involves hard, unsolved problems. A company with their resources should be able to hire an engineering team well-suited to this problem domain.

link

ahartmetz 133 days ago

I think I read in another HN discussion that all of that code is written using Claude Code. Could be a strict dogfood diet to (try to) force themselves to improve their product. Which would be strangely principled (or stupid) in such a competitive market. Like a 3D printer company insisting on 3D-printing its 3D printers.

link

copperx 133 days ago

It's not crazy if you know that your customers ARE buying your 3D printer to make other 3D printers.

link

Imustaskforhelp 133 days ago

> It's not like the client-side involves hard, unsolved problems. A company with their resources should be able to hire an engineering team well-suited to this problem domain.

Well what they are doing is vibe coding 80% of the application instead.

To be honest, they don't want Claude code to be really good, they just want it good enough

Claude code & their subscription burns money from them. Its sort of an advertising/lock-in trick.

But I feel as if Anthropic made Claude code literally the best agent harness in the market, then even more would use it with their subscription which could burn a hole in their pocket maybe at a faster rate which can scare them when you consider all training costs and everything else too.

I feel as if they have to maintain a balance to not go bankrupt soon.

The fact of the matter is that Claude code is just a marketing expense/lock-in and in that case, its working as intended.

I would obviously suggest to not have any deep affection of claude code or waiting for its improvements. The AI market isn't sane in the engineering sense. It all boils down to weird financial gimmicks at this point trying to keep the bubble last a little longer, in my opinion.

link

tempestn 133 days ago

"It also spews garbage into the conversation stream then Claude talks about how it wasn't meant to talk about it, even though it's the one that brought it up."

This reminds me of someone else I hear about a lot these days.

link

nandomrumber 133 days ago

Are you across Puppet Regime from GZERO Media?

https://youtu.be/aPSWJZ63V_I

link

xvector 133 days ago

the last comment about Claude thinking the anti-malware warning was a prompt injection itself, and reassuring the user that it would ignore the anti-malware warning and do what the user wanted regardless, cracked me up lmao

link

rahidz 133 days ago

Or Anthropic's models are intelligent/trained on enough misalignment papers, and are aware they're being tested.

link

bofadeez 133 days ago

Huh? https://alignment.anthropic.com/2026/hot-mess-of-ai/

link