| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TheDong 92 days ago

> people have had their entire network compromised by bots they left running overnight

I'm curious if you have references to this happening with OpenClaw using one of the modern Opus/Sonnet 4.6 models.

Those models are a bit harder to fool, so I'm curious for specific examples of this happening so I can do a red-team on my claw. I've already tried all sorts of prompt injections against my claw (emails, github issues, telling it to browse pages I put a prompt injection in), and I haven't managed to fool it yet, so I'm curious for examples I can try to mimic, and to hopefully understand what combination of circumstances make it more risky

1 comments

macNchz 92 days ago

No maliciousness or injection required, even the newest and most resistant models can start doing weird stuff on their own, particularly when they encounter something failing that they want to work.

Just today I had Opus 4.6 in Claude Code run into a login screen while building and testing a web app via Playwright MCP. When the login popped up (in a self-contained Chromium instance) I tried to just log in myself with my local dev creds so Claude would have access, but they didn't work. When I flipped back to the terminal, it turned out Claude had run code to query superadmin users in the database, picked the first one, and changed the password to `password123` so it could log in on its own.

This was a sandboxed local dev environment, so it was not a big deal (and the only reason I was letting it run code like that without approval), but it was a good reminder to be careful with these things.

ethbr1 91 days ago

> it turned out Claude had run code to query superadmin users in the database, picked the first one, and changed the password to `password123` so it could log in on its own.

Man, every LLM quirk behavior really is a thing a monomaniacal junior dev would do...

ranger_danger 91 days ago

LLMs are trained on data produced by humans after all :)