| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by claytonia 61 days ago

Nice project. "Pattern-based injection detection is easily bypassed by obfuscated inputs" is a sharp finding. It got me thinking about two different ways to do agent security.

What you're building is content security: scan the request, decide if it's safe, block it if not. This matters, but it's an arms race. Every rule eventually gets bypassed.

But I think there may be another approach that works at a lower level: don't scan the content at all. Instead, give each agent a set of capability tokens. No token for bash? Can't call bash. No token for file read? Can't read files. Doesn't matter what the prompt says. This is how OS kernels work. You don't check if a syscall looks malicious. You check if the process has the right to make it.

The two actually work at different levels: * Capabilities block whole categories of action. No arms race. But they don't look at content. * Content scanning catches bad stuff within allowed actions. Good for defense in depth.

Most agent security work I see is on the content side. The capability side may be harder to build but it removes the biggest default risk: an agent that can reach any tool, any API, any file by default.

Your OPA integration is an interesting step which may be toward the capability side.