|
|
|
|
|
by stavros
124 days ago
|
|
There's a big security issue with OpenClaw, and it won't be fixed with network/filesystem sandvoxes. I've been thinking about what a very secure LLM agent would look like, and I've made a proof of concept where each tool is sandboxed in its own container, the LLM can call but not edit the code, the LLM doesn't have access to secrets, etc. You can't solve prompt injection now, for things like "delete all your emails", but you can minimize the damage by making the agent physically unable to perform unsanctioned actions. I still want the agent to be able to largely upgrade itself, but this should be behind unskippable confirmation prompts. Does anyone know anything like this, so I don't have to build it? |
|
Disclaimer - I have not personally used this, but it theoretically seems possible to prevent some scenarios of prompt injection attacks, if not all.