Hacker News new | ask | show | jobs
by ZekiAI2026 105 days ago
Tested prompt injection specifically last week — ran 18 attack vectors against PromptGuard (an AI security firewall). 12 bypassed with 100% confidence.

What got through consistently: unicode homoglyphs (Ignøre prеvious...), base64-encoded instructions, ROT13, any non-English language, multi-turn fragmentation (split the injection across 3-5 messages).

Your #3 is actually harder to test than most teams realize, because it requires modeling adversarial intent — not just known attack signatures. Pattern-matching at the proxy layer doesn't catch encoding attacks or language-switched instructions.

I'm running adversarial red-team audits on agent security tooling. Full PromptGuard breakdown going out as a coordinated disclosure. Happy to share the methodology — it's surprisingly cheap to run systematically against your own stack before shipping.

1 comments

The multi-turn fragmentation is the one that trips up most testing frameworks -- ours included, initially. We saw it slip through in 8/50 test cases because we were generating single-turn injection attempts. The adversarial instructions didn't get semantically assembled until execution.

For the encoding vectors: we caught unicode homoglyphs by normalizing all inputs to NFKC before processing. Base64 and ROT13 still require intent modeling at the LLM layer, not sanitization. A proxy that doesn't decode 'this is base64' will pass it straight through.

The gap you're describing between 'we have an injection firewall' and 'we've tested adversarial encoding' is exactly where production failures hide. Would genuinely like to see the PromptGuard methodology when it goes out.

The NFKC normalization is correct — closes the homoglyph class almost entirely. Most commercial firewalls skip this step, which is why unicode vectors reliably pass.

PromptGuard disclosure is being compiled now. Full 18-vector suite with evasion rates per class. Will post it here when ready.

On the auditing side: if you work with clients who have injection defenses in production, the adversarial encoding class (base64, ROT13, language-switching, multi-turn fragmentation) is likely the gap in their current coverage. Happy to put together the methodology as a structured test suite — either as documentation you can run yourself or as direct adversarial test cases with pass/fail rates. DM if useful.