| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cobbal 61 days ago
	That's funny. It told me that blocking "npm run build" was the wrong answer. Maybe it doesn't really under The threat model.

1 comments

dns_snek 61 days ago

That's a great example of how dangerous actions are perceived as innocent. The entire model of approving specific commands is absolutely bonkers.

npm run build = run an arbitrary shell command written in package.json

Meanwhile the agent could have done any of the following without approval:

- edited `package.json` to contain any arbitrary build command

- planted malicious code in `build.js` (called by `npm run build`)

- planted malicious code in `node_modules/xyz/index.js` (imported by `build.js`)

link

nonethewiser 61 days ago

Yup. The most secure computer is one encased in concrete and dropped into the ocean.

link

falcor84 61 days ago

Concrete alone isn't enough, you also need to have it be enclosed in a Faraday Cage.

link

notgenerated 55 days ago

The security layer needs to parse the full agent activity with the context. It watches everything, but only interrupts the human when it matters.

Commands that can run arbitrary code need to be treated differently and can't get escalated in this opaque way.

A large part of the solution should be to drastically reduce the amount of permission approval prompts a user gets. This ensures the ones he does get are evaluated with the same concentration a manager gives a new hire's most consequential decisions.

Most importantly, because we ask him rarely, when we do he feels the accountability. The yes is his.

link

Wirbelwind 61 days ago

that's a great point, and also the problem with relying on a human-in-the-loop to catch these kind of issues when it can be circumvented even if they were perfect

link

amarant 61 days ago

What would a better system look like?

link

SOLAR_FIELDS 61 days ago

Don’t rely on your non deterministic agent and its creators to secure your software. Design defense in depth and trust guardrails that don’t expect Anthropic to vibe good security into existence.

If you start by treating any autonomous actor in your system as an actor with the potential to go rogue the design starts to create itself

link

dns_snek 61 days ago

Agents should make better use of OS sandboxing facilities with finer-grained ACLs.

Less: Do you want to run "npm run build"?

More: "npm run build" tried to read your Chrome cookie database, do you want to allow that?

Some agents like Codex use sandboxing on Linux/MacOS but the permissions are far too coarse - they'll run the command in a relatively strict sandbox and when it fails they'll ask you to allowlist the command as a whole, forever. There should be a new permission prompt every time a command tries to do something new.

Claude suggests (or used to suggest - it's been a while) to allowlist "bash" which completely defeats the point. If you do that the agent can run `bash -c "echo literally anything"`

link

nonethewiser 61 days ago

Not using agents at all. It could edit your code to do something malicious when you run it. Not even once. Not even if the agent has a gun to your head.

link

xigoi 60 days ago

Don’t give a fancy random text generator access to your computer.

link