Hacker News new | ask | show | jobs
by niyikiza 119 days ago
Good distinction, but I wonder if it's worth going further: context integrity may be fundamentally unsolvable. Agents consume untrusted input by design. Trying to guarantee the model won't be tricked seems like the wrong layer to bet on. What seems more promising is accepting that the model will be tricked and constraining what it can do when that happens. Authorization at the tool boundary, scoped to the task and delegation chain rather than the agent's identity. If a child agent gets compromised, it still can't exceed the authority that was delegated to it. Contain the blast radius instead of trying to prevent the confusion.

(Disclaimer: working on this problem at tenuo.ai)

1 comments

This is exactly right. We went down this path and the practical implementation ends up looking like capability tokens. short-lived, cryptographically signed credentials that encode what the agent is authorized to do for this specific task.

The key insight: the token isn't just authorization, it's evidence. When you issue an ES256-signed token that says "this agent was scanned for PII, classified as INTERNAL, and is authorized to call [search,read_file] for the next 60 seconds" , that token becomes the audit artifact. The auditor doesn't need to the agent or the operator; they verify the token chain.

On "contain the blast radius instead of preventing the confusion" agreed, but you need both. Containment (scoped permissions, delegation chains) handles authorization. But you still need a detection layer for data protection: PII flowing to an external model is a GDPR or EU AI Act (def. in europe) violation regardless of whether the agent was "authorized" to make that call. We found deterministic scanning (regex + normalization, not LLM judges) at the proxy layer catches this at ~250ms without the reliability problems of using another model to judge the first one.

The ergonomics point tucnak raised is real too. We use OPA/Rego for the policy layer with presets so operators don't have to write Rego from scratch, pick a security posture and tune from there. The governance tax has to be near-zero or teams just bypass it.