Hacker News new | ask | show | jobs
by atemerev 25 days ago
--dangerously-skip-permissions is the only way to fly. Of course your environment needs to be properly containerized and autobackup set up, so even rm -rf from your harness would do nothing. Life is too short to spend on replying to permissions requests.
3 comments

It's true.

I think most people would be horrified about how I run. I just have a hook that blocks obviously unsafe commands (removals, reading secrets, etc) but other than that, the agent is free to do whatever it wants on my machine.

I used to run in a sandbox but for me personally I see these agents as fairly well aligned / intelligent and I am the one prompting them so the risk of injection is none. The hooks are just there to prevent them from getting too ambitious or crafty.

I've seen these suggestions but I am really curious about the set up because I just don't get it.

If you want to work on the code then you need to have access to the repositories, so you need the github token. Then, to test the app, you may need your own backend token. And VPN. Of course, only to DEV, of course all tokens encrypted. So, only DEV and your branch of the code is in danger. In my view, even that is pretty bad.

So, how does such a set up work?

> If you want to work on the code then you need to have access to the repositories, so you need the github token.

Definitely not! I only have an agent work in one repo at a time, with cross-repo work coordinated by me. I have a ton of local checkouts and leave them visible read-only to all of my agents. They can look at company code in my local checkouts, and they can download or browse open-source code, or look at it in the .src outputs of packages from Nixpkgs.

> Then, to test the app, you may need your own backend token.

I just don't let my agents test apps that run remotely, for better or for worse.

> And VPN.

This doesn't really expose anything on my system because everything internal that it could hit is authenticated, and it can't access any of my credentials. But I could do a better job restricting network access.

> your branch of the code is in danger

The agent isn't permitted by the sandbox to read the secrets it needs for `git push`. Indeed, I have commit signing enabled and the agent can't even read the files it needs for git commit! It can write code, it can write tests, it can run some tests, and it can run web applications locally and play with those.

But then I do the final testing and then turn its changes into 1-5 git commits, walking through them and selectively staging, skipping, or dropping them hunk-by-hunk according to my judgment. I still do tons of review. I just don't review edits or commands; instead I review and test whole drafts, whole changesets. It's less fatiguing because the thing I'm reviewing is more directly the thing I'm trying to produce.

I guess it ain't YOLO nirvana but I wasn't really looking for that.

Thank you for the explanation but I still don't quite get it. Is this code mounted to a separate VM where the agent is running? I mean, how does the sandboxing of agents really work?

The reason I am asking is because if it's not sandboxed on the OS level, then commands it runs may escape the harness sandboxing. Even more problematic can be a command added to some auto running script that will get executed at some point outside of the sandbox (when the developer is doing actions). So, reviewing everything before anything is executed seems like the only safe way to do it. What am I missing?

The tool I use currently is OS-level sandboxing (the OS does the sandboxing), not sandboxing built into the harness (like what Codex has turned on by default) or hypervisor-level sandboxing (i.e., the agent sees an OS that is sandboxed or an OS that constitutes the sandbox). To relax or adjust the sandbox, I have to kill the agent and reinvoke the sandbox with a new policy, which then relaunches the agent.

> Even more problematic can be a command added to some auto running script that will get executed at some point outside of the sandbox (when the developer is doing actions).

That's a real potential problem, but unfortunately the default "approve every edit" regime doesn't actually address it, either. In the normal per-command approval process, the approvals are often just suggestions; Claude will do things like silently edit files in "plan mode" anyway, for example.

If you're deeply worried about this particular kind of sandbox escape you probably don't want the agent's checkout to be your usual checkout. Then if you do have some scripts that can run automatically inside a project directory (e.g., via direnv), you just never approve them in the path to the agent's checkout and make sure direnv's state dir is unwritable inside your sandboxes. If you have code inside your project that runs without any user intervention at all, and has no approval process at all so that it will be activated or trusted even on a fresh clone you've never visited or seen before... yikes. That sucks. :(

Anyway if you take the precaution above you can still review edits to those files before they have a chance to run (or just never run them).

One thing suggested by another user in this discussion that sounds like a useful approach to me is also giving the agent a VM from which they can push to a local bare clone or something like that so that's how they emit code to you. That way they're not writing scripts to your box at all.

Git makes actions reversible. Containers and VMs allow the agent to access only the things you explicitly put inside. Okay, yes, an agent can corrupt a dev database. You need to make sure it can be easily restored anytime. Simple.
You could clone the repo yourself and not give the agent any tokens at all. When done, push it yourself. This also lets you sandbox the agent to only have access to the local repo and nothing else.
Lol. Countdown til you get pwned starts today. Let me know how that works out for you in six months.
Well working like that for about a year already, starting at the earliest days of agents.
Wow a whole year! I guess it’ll never happen.