Hacker News new | ask | show | jobs
by sneak 124 days ago
OpenClaw is not insecure because it has ports open to the internet. This is an easily solved problem in one line of code (if indeed it even has that bug, which I don’t think it does). Furthermore you’re probably behind NAT.

OpenClaw, as well as the author’s solution, is insecure because it sends the full content of all of your private documents and data to a remote inference API which is logging everything forever (and is legally obligated to provide it to DHS/ICE/FBI/et al without a warrant or probable cause). Better engineering of the agent framework will not solve this. Only better models and asstons of local VRAM will solve this.

You still then have the “agent flipped out and emailed a hallucinated suicide note to all my coworkers and then formatted my drives” problem but that’s less of a real risk and one most people are willing to accept. Frontier models are pretty famously well-behaved these days 99.9% of the time and the utility provided is well worth the 0.1% risk to most people.

4 comments

It‘s not just that - but I complete agree on not using a Personal AI assistant with some cloud service LLM provider.

Anyway, by interacting with the world, the LLM can be manipulated or even hacked by the data it encounters.

Have you used OpenClaw?

My experience has been that it doesn't take input from the world, unless you explicitly ask it to. But I guess that isn't too crazy, if you ask it to look at a website, maybe the website has a hidden prompt.

I guess that's more of a responsibility of the LLM model in the security model.

That said, I don't think the main dev is serious about security, I've listened to the whole Lex Friedman interview, and he talks about wanting to focus on security, but still dismissing security concerns whenever the arise as coming from 'haters', and there's no recognition of insecurity being possibly an inseparable tradeoff of the functional specifications of the product, I think he thinks of security as something you can slap on a product, which is a very basic misconception I see often in developers that get pwned and managers that think of security as a lever they can turn up or down through budget.

LLMs famously can't separate data from commands (what you mean by input) - that's one of their core security issues. Check simonw's lethal trifacta. Agreed on all the other points !
Ok, but system prompts are weighted differently and their context weighting is different.

Additionally, there's non LLM inputs, for one, parameters, but also just good old code:

OpenClaw is designed to programmatically read their "SOUL" often, and not forget it, user messages can also be repeated and researched. Compare that to some website's code, and while it may find a way to persist or infect the SOUL, it would need to be something specialized.

You have to admit that even if technically not different, there's a huge semantic and probabilistic difference between owner compile time input and non-owner runtime inputs.

We're all waiting for some disaster to happen due to the lethal trifecta, but as far as I know it still hasn't happened yet.
IMO if you haven't seen an agent (SOTA) veer off a plan and head towards a landmine you haven't used them long enough. And now with Ralph loops, etc it will just bury it. ClawdBot/MoltBot/OpenClaw is what ~2 months old so "hasn't happened yet" is a bit early to call.

That said, if model performance/accuracy continues to improve exponentially you will be right.

Sorry, looks like I haven't been precise.

I've seen them veer off a plan, and I've seen the posts about an agent accidentally deleting ~, but neither of those meet the definition of the lethal trifecta. I'm also not saying it can't happen - I count myself towards the ones that are waiting for it to happen. The "we" was meant literally.

That being said, I still think it's interesting that it hasn't happened yet. The longer this keeps being true, the lower my prior for this prediction will sink.

The lethal trifecta needs the right cocktail of foolishness to become a major security incident or scam: a millionaire or billionaire, an AI browser such as Comet or Atlas tied to personal email and banking, and any untrusted Reddit post, tweet, or blog.

Chrome will make this a reality sooner with Gemini-powered AI browser forced on all users

> emailed a hallucinated suicide note to all my coworkers and then formatted my drives problem ... most people are willing to accept

Are they though? I mean, I'm running all my agents in -yolo mode but I would never trust it to remain on track for more than one session. There's no real solution to agent memory (yet) so it's incredibly lossy, and so are fast/cheap sub agents and so are agents near their context limits. It's easy to see how "clean up my desktop" ends with a sub-subagent at its context limit deciding to format your hard drive.

Isn't the wasteful sending of every data and their mother the reason why OpenClaw is so useful for many people? I heard something about excessively big context-windows on every single request. So making it more secure, while still using remote LLMs, would mean making it less useful?
Yeah, I find the whole concept a bit of a nonstarter until models that I can run on a single somewhat-normal-consumerish machine (e.g. a Mac Studio) with decent capability and speed have appeared. I’m not interested in sending literally everything across the wire to somebody else’s computers, and unless the AI bubble pops and cheap GPUs start raining down on us I’m not interested in building some ridiculous tower/rackmount thing to facilitate it either.
That's what the Mac minis people are running OpenClaw on are for - access to the Apple ecosystem (iMessage, calendar, etc) + local inferencing