Hacker News new | ask | show | jobs
by stavros 124 days ago
There's a big security issue with OpenClaw, and it won't be fixed with network/filesystem sandvoxes. I've been thinking about what a very secure LLM agent would look like, and I've made a proof of concept where each tool is sandboxed in its own container, the LLM can call but not edit the code, the LLM doesn't have access to secrets, etc.

You can't solve prompt injection now, for things like "delete all your emails", but you can minimize the damage by making the agent physically unable to perform unsanctioned actions.

I still want the agent to be able to largely upgrade itself, but this should be behind unskippable confirmation prompts.

Does anyone know anything like this, so I don't have to build it?

1 comments

I’ve come across dcg - destructive command guard - that claims to have a fast rust based runtime, with prehooks to audit any tool or command executed by an agent and to block them if they fall in some dangerous patterns - https://github.com/Dicklesworthstone/destructive_command_gua...

Disclaimer - I have not personally used this, but it theoretically seems possible to prevent some scenarios of prompt injection attacks, if not all.