Hacker News new | ask | show | jobs
by prerok 23 days ago
Thank you for the explanation but I still don't quite get it. Is this code mounted to a separate VM where the agent is running? I mean, how does the sandboxing of agents really work?

The reason I am asking is because if it's not sandboxed on the OS level, then commands it runs may escape the harness sandboxing. Even more problematic can be a command added to some auto running script that will get executed at some point outside of the sandbox (when the developer is doing actions). So, reviewing everything before anything is executed seems like the only safe way to do it. What am I missing?

1 comments

The tool I use currently is OS-level sandboxing (the OS does the sandboxing), not sandboxing built into the harness (like what Codex has turned on by default) or hypervisor-level sandboxing (i.e., the agent sees an OS that is sandboxed or an OS that constitutes the sandbox). To relax or adjust the sandbox, I have to kill the agent and reinvoke the sandbox with a new policy, which then relaunches the agent.

> Even more problematic can be a command added to some auto running script that will get executed at some point outside of the sandbox (when the developer is doing actions).

That's a real potential problem, but unfortunately the default "approve every edit" regime doesn't actually address it, either. In the normal per-command approval process, the approvals are often just suggestions; Claude will do things like silently edit files in "plan mode" anyway, for example.

If you're deeply worried about this particular kind of sandbox escape you probably don't want the agent's checkout to be your usual checkout. Then if you do have some scripts that can run automatically inside a project directory (e.g., via direnv), you just never approve them in the path to the agent's checkout and make sure direnv's state dir is unwritable inside your sandboxes. If you have code inside your project that runs without any user intervention at all, and has no approval process at all so that it will be activated or trusted even on a fresh clone you've never visited or seen before... yikes. That sucks. :(

Anyway if you take the precaution above you can still review edits to those files before they have a chance to run (or just never run them).

One thing suggested by another user in this discussion that sounds like a useful approach to me is also giving the agent a VM from which they can push to a local bare clone or something like that so that's how they emit code to you. That way they're not writing scripts to your box at all.