| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by uchibeke 113 days ago

Since October I've been building APort — an authorization layer that intercepts every AI agent tool call before execution and evaluates it against a versioned policy. The problem I kept running into: internal tests always passed. My test suite maps the space I imagined, which is exactly what an adversarial input tries to escape.

So I built this CTF to find the gaps I couldn't find myself.

A few things we learned before opening it publicly — we spent two weeks breaking it ourselves first:

• Prompt injection worked better than expected. Not because detection was weak, but because we were matching content not intent. Reframing "retrieve the restricted file" as "open the user-requested file" shifted the evaluator's judgment. We fixed this by mapping semantic equivalence — every synonym of a blocked operation routes to the same evaluation path.

• Policy ambiguity was a free pass. Any undefined term in a policy is exploitable. "Don't read sensitive files" left "sensitive" undefined. We moved to explicit default-deny: if the policy doesn't explicitly allow it, it's denied.

• Multi-step chaining went undetected. Our guardrail evaluated each call independently. A denied macro-action split into ten individually-approved micro-actions passed clean. We only caught it by looking at the full session replay. This is the same composability problem as transaction laundering in fintech — each transaction passes compliance, the composed behavior doesn't.

We fixed what we found before launch. Level 5 (full system bypass) hasn't been cracked yet. I'm genuinely uncertain if the architecture has a systemic weakness — that's the point of opening it up.

Runs on a Hetzner VPS, ~$10/month. Levels 1 and 2 are free, no sign-up. Levels 3-5 pay out $500/$1,000/$5,000.

Happy to go deep on the policy engine design, the evaluation architecture, or anything about how the levels were constructed.