My experience has been that it doesn't take input from the world, unless you explicitly ask it to. But I guess that isn't too crazy, if you ask it to look at a website, maybe the website has a hidden prompt.
I guess that's more of a responsibility of the LLM model in the security model.
That said, I don't think the main dev is serious about security, I've listened to the whole Lex Friedman interview, and he talks about wanting to focus on security, but still dismissing security concerns whenever the arise as coming from 'haters', and there's no recognition of insecurity being possibly an inseparable tradeoff of the functional specifications of the product, I think he thinks of security as something you can slap on a product, which is a very basic misconception I see often in developers that get pwned and managers that think of security as a lever they can turn up or down through budget.
LLMs famously can't separate data from commands (what you mean by input) - that's one of their core security issues. Check simonw's lethal trifacta. Agreed on all the other points !
Ok, but system prompts are weighted differently and their context weighting is different.
Additionally, there's non LLM inputs, for one, parameters, but also just good old code:
OpenClaw is designed to programmatically read their "SOUL" often, and not forget it, user messages can also be repeated and researched. Compare that to some website's code, and while it may find a way to persist or infect the SOUL, it would need to be something specialized.
You have to admit that even if technically not different, there's a huge semantic and probabilistic difference between owner compile time input and non-owner runtime inputs.
IMO if you haven't seen an agent (SOTA) veer off a plan and head towards a landmine you haven't used them long enough. And now with Ralph loops, etc it will just bury it. ClawdBot/MoltBot/OpenClaw is what ~2 months old so "hasn't happened yet" is a bit early to call.
That said, if model performance/accuracy continues to improve exponentially you will be right.
I've seen them veer off a plan, and I've seen the posts about an agent accidentally deleting ~, but neither of those meet the definition of the lethal trifecta. I'm also not saying it can't happen - I count myself towards the ones that are waiting for it to happen. The "we" was meant literally.
That being said, I still think it's interesting that it hasn't happened yet. The longer this keeps being true, the lower my prior for this prediction will sink.
The lethal trifecta needs the right cocktail of foolishness to become a major security incident or scam: a millionaire or billionaire, an AI browser such as Comet or Atlas tied to personal email and banking, and any untrusted Reddit post, tweet, or blog.
Chrome will make this a reality sooner with Gemini-powered AI browser forced on all users
My experience has been that it doesn't take input from the world, unless you explicitly ask it to. But I guess that isn't too crazy, if you ask it to look at a website, maybe the website has a hidden prompt.
I guess that's more of a responsibility of the LLM model in the security model.
That said, I don't think the main dev is serious about security, I've listened to the whole Lex Friedman interview, and he talks about wanting to focus on security, but still dismissing security concerns whenever the arise as coming from 'haters', and there's no recognition of insecurity being possibly an inseparable tradeoff of the functional specifications of the product, I think he thinks of security as something you can slap on a product, which is a very basic misconception I see often in developers that get pwned and managers that think of security as a lever they can turn up or down through budget.