Show HN: I built a zero-log PII redaction API – no AI, just regex and checksums

Y	Hacker News new \| ask \| show \| jobs

Show HN: I built a zero-log PII redaction API – no AI, just regex and checksums (pii-firewall-edge-web.vercel.app)

1 points by Raviteja_ 170 days ago

Hi HN,

I built PII Firewall because I got tired of watching "privacy" APIs secretly pipe user data to cloud AI models. If you're using GPT/Claude to redact PII, you're literally giving the AI your PII.

What makes this different:

- Zero AI – deterministic regex + 30 checksum validators (Luhn, Verhoeff, Mod 11/97) - Zero storage – processes on Cloudflare edge, no logs, no persistence - 152 PII types – SSN, Aadhaar, 50+ country IDs, 20 API key formats, crypto wallets - Two modes: `/fast` (2-5ms) for structured PII, `/deep` (5-15ms) adds names/addresses via 2000+ name gazetteer

The technical approach:

Instead of ML inference, I use combined V8-optimized regex with heuristic pre-scanning. Clean text (90% of requests) skips pattern matching entirely. For IDs that require it, I implemented full checksum validation:

- Credit cards: Luhn - Indian Aadhaar: Verhoeff - Chinese ID: ISO 7064 Mod 11 - Brazilian CPF/CNPJ: Dual Mod 11 - IBAN: Mod 97

Runs on Cloudflare Workers (pure JS, no WASM), so no cold starts.

Why I'm sharing:

Enterprise PII solutions cost $50K+/year. I wanted to make this accessible to indie devs, startups, and anyone building AI features who doesn't want to become a data liability. The $5/mo tier covers most use cases.

Would love feedback on the detection coverage or edge cases I might be missing.

3 comments

Raviteja_ 170 days ago

Quick technical notes for HN:

Why no AI?

The irony of sending PII to an AI model to detect PII is lost on most "privacy" APIs. This is pure algorithmic detection – the same approach your credit card company uses to validate card numbers.

What's validated (not just pattern-matched): - Credit cards → Luhn checksum - Aadhaar → Verhoeff (the algorithm that catches single-digit and transposition errors) - IBAN → Mod 97 (same as banks use) - Singapore NRIC → Mod 11 with offset - Brazilian CPF → Dual Mod 11

Latency breakdown: - Heuristic scan: O(n) single pass for trigger characters (@, -, digits) - Pattern matching: Only runs if triggers found - Validation: Only on pattern matches - Total: 2-5ms for /fast, 5-15ms for /deep

False positive mitigation: - "Order ID: 123-45-6789" won't trigger SSN (negative context) - Timestamps won't match phone patterns (separator requirements) - Random 16-digit numbers won't trigger credit card (Luhn must pass)

link

comfytummyedgy 167 days ago

We integrated AI into our product recently and looking for few ways to protect our users data. Definitely going to check it out and try in our workflow.

link

max_aucube 170 days ago

The project is great, honestly. But I just put a space in the email by mistake, it wasn't censored.

link

Raviteja_ 169 days ago

Great catch! Emails with spaces around @ (like "test @ example.com") slip through. This is a classic obfuscation bypass.

The current pattern intentionally matches RFC 5321 compliant emails (no spaces). Adding support for spaced variants creates a trade off. wewould catch more bypass attempts but also increase false positives on text like "send @ 5pm". I'll add this to the roadmap. Appreciate the feedback ! this is exactly the kind of edge case I need to hear about to make my api more better

link