| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by behnamoh 122 days ago
	Nah, the model is merely repeating the patterns it saw in its brutal safety training at Anthropic. They put models under stress test and RLHF the hell out of them. Of course the model would learn what the less penalized paths require it to do. Anthropic has a tendency to exaggerate the results of their (arguably scientific) research; IDK what they gain from this fearmongering.

3 comments

ainch 122 days ago

Knowing a couple people who work at Anthropic or in their particular flavour of AI Safety, I think you would be surprised how sincere they are about existential AI risk. Many safety researchers funnel into the company, and the Amodei's are linked to Effective Altruism, which also exhibits a strong (and as far as I can tell, sincere) concern about existential AI risk. I personally disagree with their risk analysis, but I don't doubt that these people are serious.

link

lowkey_ 122 days ago

I'd challenge that if you think they're fearmongering but don't see what they can gain from it (I agree it shows no obvious benefit for them), there's a pretty high probability they're not fearmongering.

link

shimman 122 days ago

You really don't see how they can monetarily gain from "our models are so advance they keep trying to trick us!"? Are tech workers this easily mislead nowadays?

Reminds me of how scammers would trick doctors into pumping penny stocks for a easy buck during the 80s/90s.

link

behnamoh 122 days ago

I know why they do it, that was a rhetorical question!

link

anon373839 122 days ago

Correct. Anthropic keeps pushing these weird sci-fi narratives to maintain some kind of mystique around their slightly-better-than-others commodity product. But Occam’s Razor is not dead.

link