| I've been experimenting with autonomous agents that have shell and database access. The standard approach seems to be "put safety guardrails in the system prompt", but that feels like a house of cards honestly. If a model is stochastic, its adherence to security instructions is also stochastic. I'm looking into building a hard "Action Authorization Boundary" (AAB) that sits outside the agent's context window entirely. The idea is to intecept the tool-call, normalize it into intent against a deterministic YAML policy before execution. A few questions for those building in this space: Canonicalization: How do you handle the messiness of LLM tool outputs? If the representation isn't perfectly canonical, the policy bypasses seem trivial. Stateful Intent: How do you handle sequences that are individually safe but collectively risky? For example, an agent reading a sensitive DB (safe) and then making a POST request to an external API (dangerous exfiltration). Latency: Does moving the "gate" outside the model-loop add too much overhead for real-time agentic workflows? I’ve been working on a CAR (Canonical Action Representation) spec to solve this, but I’m curious if I'm overthinking it or if there’s an existing firewall for agents standard I'm missing. |
Different angle than policy-as-YAML. We use cryptographic capability tokens (warrants) that travel with the request. The human signs a scoped, time-bound authorization. The tool validates the warrant at execution, not a central policy engine.
On your questions:
Canonicalization: The warrant specifies allowed capabilities and constraints (e.g., path: /data/reports/*). The tool checks if the action fits the constraint. No need to normalize LLM output into a canonical representation.
Stateful intent: Warrants attenuate. Authority only shrinks through delegation. You can't escalate from "read DB" to "POST external" unless the original warrant allowed both. A sub-agent can only receive a subset of what its parent had, cryptographically enforced.
Latency: Stateless verification, ~27μs. No control plane calls. The warrant is self-contained: scope, constraints, expiry, holder binding, signature chain. Verification is local.
The deeper issue with policy engines: they check rules against actions, but they can't verify derivation. When Agent B acts, did its authority actually come from Agent A? Was it attenuated correctly?
Wrote about why capabilities are the only model that survives dynamic delegation: https://niyikiza.com/posts/capability-delegation/