| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Tepix 124 days ago
	It‘s not just that - but I complete agree on not using a Personal AI assistant with some cloud service LLM provider. Anyway, by interacting with the world, the LLM can be manipulated or even hacked by the data it encounters.

1 comments

TZubiri 124 days ago

Have you used OpenClaw?

My experience has been that it doesn't take input from the world, unless you explicitly ask it to. But I guess that isn't too crazy, if you ask it to look at a website, maybe the website has a hidden prompt.

I guess that's more of a responsibility of the LLM model in the security model.

That said, I don't think the main dev is serious about security, I've listened to the whole Lex Friedman interview, and he talks about wanting to focus on security, but still dismissing security concerns whenever the arise as coming from 'haters', and there's no recognition of insecurity being possibly an inseparable tradeoff of the functional specifications of the product, I think he thinks of security as something you can slap on a product, which is a very basic misconception I see often in developers that get pwned and managers that think of security as a lever they can turn up or down through budget.

link

mentalgear 124 days ago

LLMs famously can't separate data from commands (what you mean by input) - that's one of their core security issues. Check simonw's lethal trifacta. Agreed on all the other points !

link

TZubiri 124 days ago

Ok, but system prompts are weighted differently and their context weighting is different.

Additionally, there's non LLM inputs, for one, parameters, but also just good old code:

OpenClaw is designed to programmatically read their "SOUL" often, and not forget it, user messages can also be repeated and researched. Compare that to some website's code, and while it may find a way to persist or infect the SOUL, it would need to be something specialized.

You have to admit that even if technically not different, there's a huge semantic and probabilistic difference between owner compile time input and non-owner runtime inputs.

link

mr_mitm 124 days ago

We're all waiting for some disaster to happen due to the lethal trifecta, but as far as I know it still hasn't happened yet.

link

dimitri-vs 124 days ago

IMO if you haven't seen an agent (SOTA) veer off a plan and head towards a landmine you haven't used them long enough. And now with Ralph loops, etc it will just bury it. ClawdBot/MoltBot/OpenClaw is what ~2 months old so "hasn't happened yet" is a bit early to call.

That said, if model performance/accuracy continues to improve exponentially you will be right.

link

mr_mitm 124 days ago

Sorry, looks like I haven't been precise.

I've seen them veer off a plan, and I've seen the posts about an agent accidentally deleting ~, but neither of those meet the definition of the lethal trifecta. I'm also not saying it can't happen - I count myself towards the ones that are waiting for it to happen. The "we" was meant literally.

That being said, I still think it's interesting that it hasn't happened yet. The longer this keeps being true, the lower my prior for this prediction will sink.

link

sathish316 124 days ago

The lethal trifecta needs the right cocktail of foolishness to become a major security incident or scam: a millionaire or billionaire, an AI browser such as Comet or Atlas tied to personal email and banking, and any untrusted Reddit post, tweet, or blog.

Chrome will make this a reality sooner with Gemini-powered AI browser forced on all users

link