Hacker News new | ask | show | jobs
by drakythe 35 days ago
He's made so many statements that fall under the "boy who cried wolf" category that even if he _does_ believe these statements he needs to be managed better. I'll never forget Anthropic's huge "Oh my God, the AI blackmailed a researcher to save itself!" and the prompt effectively told the AI to do that and gave it forged emails with easy blackmail targets, as if this isn't a common trope in mystery or suspense books/television/fanfiction, all of which Claude (and others) have been trained on.
2 comments

It's a common trope, all through the training data, and all the modern AIs have read it, and would probably act similarly? Is that what we should take away from your comment? so we have nothing to worry about. Makes sense. Really, it's just a common trope.
Oh of course wolves have sharp teeth, they're predators. Anyone know knows this can never be bitten.
I'm saying the existence of the trope, within the training data, and the experimental setup, negate the breathless "Oh my god it did something unexpected in order to preserve itself!" as if an LLM has any sense of identity or self.

Many, many other bad things are in the training data. For an example of how this can manifest bad things that people don't seem to be discussing too much check out the recent Behind the Bastards episodes about how an AI Chatbot became a Cult Leader (The title is an exaggeration that the host explains while raising some excellent points about how LLMs have ingested a lot of cult leader material and can therefore mimic those speech patterns and impact people vulnerable to such things)

Imagine you're in a car and the car is driving towards a cliff. You shout at the driver "oh my god we're about to go over a cliff!" And he says "you said that two seconds ago, but we're still alive, you're just like the boy who cried wolf. Do you know exactly when we're going to go over a cliff? No? Maybe you're imagining the cliff."

I think it's very improbable that AI is as dangerous as Yud et al fear it is. But it's too soon to say and there seems to be significant long-tail risk. Mocking or criticizing people for being concerned about that risk seems counterproductive.

Seems like the life cycle of huge tech companies like meta, Google, Microsoft, Amazon is "do whatever's necessary to take over the world, then enshittify." I don't take it for granted that Amodei and Anthropic seem to not quite be maximally power hungry?

Re: second half of your comment. Understanding a threat doesn't neutralize it. Anthropic didn't make that big a deal of it either; it was news articles that blew it out of proportion.