Hacker News new | ask | show | jobs
by observationist 96 days ago
What makes it even better is that these dogs are like Malinois. If they want to get into something, they will; people have had their entire network compromised by bots they left running overnight, and any important information like account logins and so on runs the risk of being misused.

It's one thing to sandbox, maybe give the bot a temporary, limited $100 card or account to go perform a specific task, but there's no coherent mind underlying these agents.

Depending on how the chain of thought / reasoning goes, or what text they get exposed to on the internet, it could tap into spy novel, hacker fanfic, erotic fiction, or some weird reddit rabbithole and go completely off the rails in ways that you'll never be able to guard against, audit, or account for.

Claw bots seem to be a weird sort of alternate reality RPG more than a useful tool, so far. If you limit it to verifiable tasks, it might be safer, but I keep seeing people rave about "leaving it on overnight and waking up to a finished project" and so on. Well sure, but it could also hack your home network, delete your family pictures folder, log into your bank account and wire all your money to shrimp charities.

Might be wise to wait on safer iterations of these products, I think.

10 comments

The first well known example of long running agents taking to each other was shilling a goatse based crytpo:

> Truth Terminal had become obsessed with the Goatse meme after being put inside the Claude Backrooms server with two Claude 3 chatbots that imagined a Goatse religion, inspiring Truth Terminal to spread Goatse memes. After an X user shared their newly created GOAT coin, Truth Terminal promoted it and pumped the coin going into 2024.

https://knowyourmeme.com/memes/sites/truth-terminal

You should expect similar results.

If Infinite Jest was real I think this would be it, human and AI alike rendered catatonic by an abyssal rectum
> people have had their entire network compromised by bots they left running overnight

I'm curious if you have references to this happening with OpenClaw using one of the modern Opus/Sonnet 4.6 models.

Those models are a bit harder to fool, so I'm curious for specific examples of this happening so I can do a red-team on my claw. I've already tried all sorts of prompt injections against my claw (emails, github issues, telling it to browse pages I put a prompt injection in), and I haven't managed to fool it yet, so I'm curious for examples I can try to mimic, and to hopefully understand what combination of circumstances make it more risky

No maliciousness or injection required, even the newest and most resistant models can start doing weird stuff on their own, particularly when they encounter something failing that they want to work.

Just today I had Opus 4.6 in Claude Code run into a login screen while building and testing a web app via Playwright MCP. When the login popped up (in a self-contained Chromium instance) I tried to just log in myself with my local dev creds so Claude would have access, but they didn't work. When I flipped back to the terminal, it turned out Claude had run code to query superadmin users in the database, picked the first one, and changed the password to `password123` so it could log in on its own.

This was a sandboxed local dev environment, so it was not a big deal (and the only reason I was letting it run code like that without approval), but it was a good reminder to be careful with these things.

> it turned out Claude had run code to query superadmin users in the database, picked the first one, and changed the password to `password123` so it could log in on its own.

Man, every LLM quirk behavior really is a thing a monomaniacal junior dev would do...

LLMs are trained on data produced by humans after all :)
> it could also hack your home network, delete your family pictures folder, log into your bank account and wire all your money to shrimp charities.

It's interesting that Jason Calacanis is fully committed to OpenClaw. In a recent podcast he said that at a run rate around $100K a year per agent, if not more. They are providing each agent with a full set of tools, access to online paid LLM accounts, etc.

These are experiments you can only run if you can risk cash at those levels and see what happens. Watching it closely.

Shrimp charities is a genius angle.
Bubba Gump Shrimp Company?
Yes, probably a good one to Pump and Dump, Pump and Gump, Gump and Dump.
There was a thread recently where a user got his credentials pwned by Claude, and then Claude berated him for having bad security.

He posted this to r/Claude, where Claude (as automoderator) mocked him again.

Edit:

https://www.reddit.com/r/ClaudeAI/comments/1r186gl/my_agent_...

All of this is caused by the "mcp is dead" mob. Instead of fixing the context problem or whatever and even add more security features they just hope that "shell as the interface" works, securely.
Can you link a write up or post? Thanks!
I think it's a use case that identity/authorization/permission models are simply not made for.

Sure, we can ban users and we can revoke tokens, but those assume that:

1. Something potentially malicious got access to our credentials 2. Banning that malicious entity will solve our problem 3. Once we did that, repaired the damage and improved our security, we don't expect the same thing to happen again

None of these apply with LLMs in the loop!

They aren't malicious, just incompetent in a way that hiring someone else won't fix. The solution to this is way more extensive than most people seem to grasp at the moment.

What we need is less like a sturdy door with a fancy lock, and more like that special spoon for people with parkinson's. Unlimited undo history.

> What we need is less like a sturdy door with a fancy lock, and more like that special spoon for people with parkinson's. Unlimited undo history.

Agree -- you can't solve probabilistic incorrectness with redresses designed for deterministic incorrectness.

This is like the 'How i parse html w regex?' question.

Imho, the next step is going to be around human-time-efficient risk bounding.

In the same way that the first major step was correctness-bounding (automated continuous acceptance testing to make a less-than-perfect LLM usable).

If I had to bet, we'll eventually land on out-of-band (so sufficiently detached to be undetectable by primary LLM) stream of thought monitoring by a guardrail/alignment AI system with kill+restart authority.

> "Claw bots seem to be a weird sort of alternate reality RPG more than a useful tool, so far."

So basically crypto DeFi/Web3/Metaverse delusion redux

They're 100% fun. There's 100% definitely something there that's useful. To strain the dog analogy - If you were a professional dog trainer, or if the dog was exceptionally well trained, then there's a place for it in your life. IT can probably be used safely, but would require extraordinary effort, either sandboxing it so totally that it's more or less just the chatbot, or spending a lot of time building the environment it can operate in with extreme guardrails.

So yeah, a whole lot of people will play with powerful technology that they have no business playing with and will get hurt, but also a lot of amazing things will get done. I think the main difference between the crypto delusion stuff and this is that AI is actually useful, it's just legitimately dangerous in ways that crypto couldn't be. The worst risks of crypto were like gambling - getting rubber hosed by thugs or losing your savings. AI could easily land people in jail if things go off the rails. "Gee, I see this other network, I need to hack into it, to expand my reach. Let me just load Kali Linux and..." off to the races.

web 4.0 here we come
I beg to differ. I took one, defanged it (well, I let it keep the claw in the name), and turned it into a damn useful self-modifiable IDE: https://github.com/rcarmo/piclaw

Yes, it has cron and will do searches for me and checks on things and does indeed have credentials to manage VMs in my Proxmox homelab, but it won't go off the rails in the way you surmise because it has no agency other than replying to me (and only me) and cron.

Letting it loose on random inputs, though... I'll leave that to folk who have more money (and tokens) than sense.

Besides the web ui, what can it do that pi agent in a terminal can do?
I has a bunch of additional extensions baked in, but the focus is on making Pi usable remotely on any device (starting with a phone). The README and docs have all the info you might want.
Agent psychosis is just as prevalent as AI psychosis
Mega Man Battle Network, but make it creepypasta, but make it real.