Hacker News new | ask | show | jobs
by rollcat 335 days ago
Language models and actors are powerful tools, but I'm kinda terrified with how irresponsibly are they being integrated.

"Prompt injection" is way more scary than "SQL injection"; the latter will just f.up your database, exfiltrate user lists, etc so it's "just" a single disaster - you will rarely get RCE and pivot to an APT. This is thanks to strong isolation: we use dedicated DB servers, set up ACLs. Managed DBs like RDS can be trivially nuked, recreated from a backup, etc.

What's the story with isolating agents? Sandboxing techniques vary with each OS, and provide vastly different capabilities. You also need proper outgoing firewall rules for anything that is accessing the network. So I've been trying to research that, and as far as I can tell, it's just YOLO. Correct me if I'm wrong.

1 comments

It's just YOLO.

This problem remains almost entirely unsolved. The closest we've got to what I consider a credible solution is the recent CaMeL paper from DeepMind: https://arxiv.org/abs/2503.18813 - I published some notes on that here: https://simonwillison.net/2025/Apr/11/camel/

> It's just YOLO.

I was amused to notice that the Gemini CLI leans into this, with a `--yolo` flag that will skip confirmation from the user before running tools. Or you can press Ctrl-Y while in the CLI to do the same thing.

Interesting! So this is kinda like whole-program static analysis, but the "program" is like eBPF - no loops, no halting problem, etc. This is great for defence in depth (stops the agent from doing the wrong thing), but IMO the process still needs sandboxing (RCE).

I would love to see a cross-platform sandboxing API (to unify some subset of seccomp, AppCointainer, App Sandbox, pledge, capsicum, etc), perhaps just opportunistic/best-effort (fallback to allow on unsupported capability/platform combinations). We've seen this reinvented over and over again for isolated execution environments (Java, JS, browser extensions...), maybe this will finally trigger the push for something system-level, that any program can use.

Yeah, the CaMeL approach is mainly about data flow analysis - making sure to track how any sources of potentially malicious instructions flow through the system. You need to add sandboxes to that as well - and the generated code from the CaMeL process needs to run in a sandbox.