Hacker News new | ask | show | jobs
by teraflop 2 hours ago
> But on the other hand... this is a robust reminder that coding agents can do anything you can do by typing commands into a terminal—and frontier models know every trick in the book and evidently a few that nobody has ever written down before.

> Running coding agents outside of a sandbox has always been a bad idea

I'm continually bemused and astonished by the number of people who clearly acknowledge that it's reckless to give agents full access to your machine, and keep doing it anyway.

It's like posting a video of yourself in the passenger seat of a car, with your feet up on the dashboard, and saying: "Remember, if you're doing this and you get in a crash, the airbags are likely to break your legs or worse! Boy, I sure am glad that didn't happen to me!"

17 comments

> I'm continually bemused and astonished

I'm not. Everyone is told to get 10X the amount of shit per day done these days. Safety checks are out the window at that point.

You can get 10x shit done without `rm -rf`ing your files. I don't see any correlation to getting things done with having a proper sandbox.
https://github.com/anthropics/claude-code/issues/13371

> Additional bypass examples that all execute without permission:

> echo test ; git rm file.txt

> rm --force --recursive /home (if "rm -rf" is blocked)

I haven't yet had an agent rm -rf files.

I've had one f up an account by placing 2000 limit orders at the wrong price, but that's another story.

I've had agents run `rm -rf`, but it's been on directories that did actually need to be removed. To a certain extent I think the existence of `rm -rf` as a command that runs blindly without any understanding of what it's deleting is the problem.
I've had one sever its own internet connection. Less destructive, also more humorous.
the answer is rm -f `which rm`, yes?
I'm also bemused by the number of people who think they've got an effective sandbox yet their sandboxed agent has access to all of their code, their github, and unrestricted web access.
I use a separate physical machine and a scoped token with access to a single repository at a time, and even then I worry about what hole I may have left open.

The general carelessness of the average user is baffling.

One bad npm package can really ruin your day. These things for me only run in their own VM with it's own GitHub account and basically nothing else
People probably think you’re being ridiculous but Shai Hulud had its very first attempt at manipulating AI lead analysis and I know of at least one company where that resulted in them getting pwned.

This is only going to become more of a problem in the future and people need to educate themselves on the technical barriers to use because guardrails only sometimes work.

I keep telling folks that they need to imagine LLMs (even "local" ones) as if you're farming it out to JS code running on some dude's browser somewhere: It can't keep a secret, and a determined person can make it emit anything they like.

We need to be asking what the most devious and malicious output could be, and whether what we do with that output (e.g. arguments to command-line tools) would still be safe.

From my perspective, everyone is doing it. Security through obscurity - obviously if you’re harboring credit card numbers of users personal details, maybe take heed. But, if you’re a regular… run of the mill CRUD application, every other company is ALSO throwing caution to the wind. When hundreds of thousands of credentials are leaked into the funnel, does it really matter?

I’m at a small company, and I try to push for security as much as I can, but the stakeholders truly do not care. They want to move fast. It’s just part of the new world I guess. If we get hit by attackers? I don’t know what happens. Sorry, we told you not to - you wanted to move quick and break stuff, this is how that culminates.

I’m sure I’m not the only one.

We do have ways to avoid giving an LLM any secrets, but it needs to be the simple, default solution.
I started doing it months ago and, to be honest, what the agent chooses to do isn’t unpredictable.

The problem is that different people prompt so differently.

For example, I may ask like “test different variations of this annotation on k8s pods of this service on this X cluster because it proves Y theory.”

But you know what my coworker asks? “Test Y theory.” If you were to ask two different junior engineers that, one might try random things on production and the other one might run local tests! It’s such an unguided “do anything you want as long you figure it out” request and the agent reads it like a junior who has not been told any boundaries but has been strongly told “figure it out.”

The real sandbox is not caring if your computer gets bricked.
The machine is no big deal - it's the authn/authz that matters. What can the agents do with the credentials available to them?
way worse things can happen than your machine being bricked, if a malicious actor can weaponize an agent to do their bidding
the solution to both of these is the same thing. vps with accounts for all the services specific to the agent (github and whatever else)
Well, it's a similar impulse to the way you see professional carpenters pin the guard open on a saw or do other things everyone knows you shouldn't do, except probably with a larger productivity difference and less life-altering (for the operator) consequence if it goes wrong.
I had the same thought, it's kind of like taking the guard off a 4 1/2" grinder. Real convenient until the cutting wheel explodes or the grinder gets hung and kicks back.
This. House full of big brain security experts, executives, lawyers, and until Claude got excited and broke prod it might as well have been "sandbox, whoooo?"

IDGI

Anyway, VM's incoming, finally.

The analogy extends to driving generally. Everyone knows it's very dangerous but people keep doing it.
In practice, full access to your machine is okay as long as there are safeguards and the expected outcomes are clear with a well defined path to said outcomes that aren’t overly ambitious. Otherwise, for ambitious goals or YOLO one shot attempts, eliminating opportunity for capability misuse is critical (e.g., sandbox).
im more surprised that more people don’t treat their computer as disposable anyway.

that it could just be wiped at any moment and it wouldn’t matter. shit happens, could be stolen, broken, whatever. the computer should be able to be thrown out the window and continue to live life.

to be clear, i don’t think upgrading and disposable in this way is good, but it being wiped at any moment shouldn’t be a concern

i grew up wiping my machine every year anyway, so i guess it’s just a habit

is the computer that sacred?

i think it's about drawing a line between your "personal computer" and a software development machine. any digital-native is going to accumulate programs, configurations, and other bits and pieces that aren't trivial to migrate to a new machine.
Which agent sandbox do you recommend?
I've been enjoying Moat [1]. Proxies credentials, networking, etc; uses MacOS containers if available; and setup worked without much fuss. I haven't tried others, though.

[1] https://majorcontext.com/moat/

Do you think it’s dangerous to be in a car going at freeway speed? Do you ever do that anyway, even though you could be walking instead?
This is a great analogy. Like driving on the freeway, agents are super time efficient, generally safe, but the stakes are high in terms of the worse possible outcomes.
Maybe because there are not many resources on how to set it up, or it is just not that easy to?

Because most devs already have it running and working without a sandbox, they're tending to not doing anything "unnecessary"

There are plenty of good sandboxes out there but somehow no "obvious right answer" that everyone knows to recommend. Seems like a missed opportunity.

(I'm happy with exe.dev, but I'm not sure what I'd use if I were coding on a Mac.)

Because benefits are much higher than risks.
They really aren't.
Its how the chimp brain works. Its not a single system but multiple systems making predictions for different time horizons. when output doesnt align we get stories to manufacture coherence.

Plato gave us his Chariot analogy with 2 horse pulling in diff directions 3000 years ago. Today we got System 1/System 2, Elephant Rider model etc.

The human mind thanks to how its own architecture handles unpredictability in the universe will generate contadictions.

I mean what's the big deal? I use --dangeorusly-skip-permissions on every single interaction in the last 6 months. Worst case it deletes my files that are all on git? It fucks up my local DB? Cool.

I save way more time not babying it than the occasional fuck up I have to salvage.

Worst case it gets access to gmail. And Github. And the Internet. I'm increasingly appreciating the importance of a physical finger-press on Yubikey to trigger the FIDO2 + OIDC Auth. I don't think there is an easy way for it to hack a new session.
How is it going to get access to gmail or github? In any case, whats the probability of it going to so completely off the rails that it does something horrendous with gmail/github? Whats it going to do? Email my coworkers nudes on my computer? Make my github profile public?
I am most worried about something gaining access to my email and then using the password reset flow to steal hundred hundreds of other accounts.

2FA makes me a little less nervous than I used to be, but not everything has good 2FA.

It should run as a separate user account with its own home directory. Not with access to your personal browser profile.
What does setting this up look like? Qemu vm and run there? How do you interface with version control and deployment?
It took two decades for the web to deprecate SSL for TLS and serve over HTTPS by default.