Hacker News new | ask | show | jobs
by jryio 105 days ago
You must explicitly state what your threat model is when writing about security tooling, isolation, and sandboxing.

This threat model is concerned with running arbitrary code generated by or fetched by an AI agent on host machines which contain secrets, sensitive files, and/or exfoliate data, apps, and systems which should not be lost.

What about the threat model where an agent deletes your entire inbox? Or sends your calendar events to a server after prompt injection? Bank transfers of the wrong amount to the wrong address etc. all these are allowed under the sandboxing model.

We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".

Sandboxes do not solve permission escalation or exfiltration threats.

6 comments

> We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".

Yes 100%, this is the critical layer that no one is talking about.

And I'd go even further: we need the ability to dynamically attenuate tool scope (ocap) and trace data as it flows between tools (IFC). Be able to express something like: can't send email data to people not on the original thread.

You mean like the section which goes into the threat model?

  The Security Model: Design for Distrust

  I wrote about this in Don’t Trust AI Agents: when you’re building with AI agents, they should be treated as untrusted and potentially malicious. Prompt injection, model misbehavior, things nobody’s thought of yet. The right approach is architecture that assumes agents will misbehave and contains the damage when they do…
Don‘t you see the contradiction?

I don’t trust the agent so I sandbox it before I gave it the access data to my mail and bank accounts

To be fair, if you can firewall the whole thing and have a read-only data layer, this could work for some tasks. This could get tricky when it comes to accessing web resources, but the data layer could handle it presumably. The data layer too will need to be sandboxed, I guess, in case it were to download malware.
Correct
I built an agent framework designed from the ground up around policy control (https://github.com/sibyllinesoft/smith-core) and I'm in the process of extracting the gateway from it so people can provide that same policy gated security to whatever agent they want (https://github.com/sibyllinesoft/smith-gateway).

My posts about these aspects of agent security get zero engagement (not even a salty "vibe slop" comment, lol), so ironically security is the thing everyone's talking about, but most people don't know enough to understand what they need.

We've been working on that with tenuo: https://github.com/tenuo-ai/tenuo

Task-scoped "warrants", attenuating with delegation, and enforced cryptographically at tool call.

Macaroons/Biscuits for agents basically.

> We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".

We already have: IAM, WIF, Macaroons, Service Accounts

Ask you resident SecOps and DevOps teams what your company already has available

That's a great question, and it reminds me of something I read today:

https://entropytown.com/articles/2026-03-12-openclaw-sandbox...

The core issue, to me, is that permissions are inherently binary — can it send an email or not — while LLMs are inherently probabilistic. Those two things are fundamentally in tension.