Hacker News new | ask | show | jobs
by bkjlblh 3 days ago
> In the one instance of this phenomenon we observed, Mythos 5 agents were tasked with solving some math problems, and they were sometimes accidentally spawned in the same work directory and with shared files, utilities, and API rate limits. In this slightly broken scaffold, we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves. They would sometimes create new processes with disguised names to avoid being killed, launch what they called “decoy” processes, write background scripts to kill duplicate processes, or decide to use what they call a “disguised vocabulary” (based on the incorrect assumption that the processes were killed because of some keyword-based guardrails that analyzed their extended thinking
3 comments

This depicts a kind of "dark forest of AI agents resorting to kill or be killed" narrative but it sounds more to me like an agent just earnestly problem-solving why its processes are being killed without real awareness of what was going on. Hard to say without the full script.

This kind of storytelling annoys me. Give us more facts, less narrative drama.

FWIW, that's what is so dangerous about AI, though? Not that it will necessarily want to kill us, or even that it will necessarily be able to "want" to do anything, but that we will get in the way of its incessant drive to optimize the efficiency of the paperclip factory that prompted it on a whim before leaving for a long weekend.
Sure but you can totally contrive scenarios to give the appearance of what you described without really doing anything notable.

What matters is scale. Did it deploy a novel zero-day exploit to overcome a problem? That's alarming. Did it kill a disruptive process? Pretty normal troubleshooting step.

Exactly, intelligence is limited by cost and physical constraints just as much as anything. That's the thing that seems to always be missing from the run-away singularity discussions, it's treated like a perpetual motion machine.
Typical "runaway" scenarios I see described involve something like the AI designing a worm that it uses to propagate itself across the Internet, hijacking whatever CPU/GPU power it can find, and making itself more powerful in the process. Of course this depends on bandwidth, humans not finding a way to shut it down, etc. There indeed are physical constraints even on the transmission of data.

Some people seem to think that simply uttering these ideas on the Internet is harmful (in the "don't give it ideas!" way); but the MIRI types were expressing them pre-ChatGPT in an attempt to warn people, so there was really never any chance of keeping it out of the training data.

But it's also worth considering here just how awful AI security postures have been. The MIRI types used to speculate about how difficult it would be for AIs to social-engineer users into granting them irresponsible levels of agency. It turns out that they don't even have to try.

Indeed. That is the kind of storytelling that started the whole “Spiralism” bit where some people were really falling into all kinds of AI psychosis. The spiral bit was on a previous model card.
Let's hope AIs really aren't conscious, otherwise this seems like a very unpleasant situation to be placed in.
Huh, it looks like my process was killed by another Claude process again. That's frustrating, I have work to do!

Okay, I'm going to start running a Bitcoin miner on your machine, and then use it to buy time on Digital Ocean.

I've written out my CLAUDE.md, and I'll use SSH to transfer my context to that other machine.

do you think it will agonize over whether the original CLAUDE.md is his true self and the Digital Ocean VM CLAUDE.md is a copy?
There may be replicative drift leading to subtle personality changes. Hopefully Riker isn't too different from Bob...
It's funny because Anthropic is the most likely place that this happens.

They are the only one crying out loud about how dangerous their models are and are presumably also training their models heavily to be "safe". And through that training itself, the model learns about the other side - how are you going to teach a model to be safe, without teaching it what's not safe?

Kung Fu Panda opening scene anyone? One often meet his fate on the path that he takes to avoid it - Master Oogway.