| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jensbontinck 110 days ago

Yes, but most are doing it wrong.

The common approaches I've seen: 1. Policy-only ("don't paste sensitive data into ChatGPT") — doesn't work, people do it anyway 2. Network-level blocking (block api.openai.com) — kills legitimate use cases 3. DLP on the network layer — can't inspect the semantic content, just sees HTTPS traffic

What actually works is a proxy-based approach: all LLM API traffic routes through a governance layer that scans the content of every request. PII gets tokenized before reaching the model (SSN becomes [SSN_TOKEN_1], model processes it fine, token gets restored on the way back). Secrets get blocked. Data classification determines which model can be used — RESTRICTED data stays on-prem or goes to a private deployment, PUBLIC data can go to any provider.

The technical challenge is doing this without killing latency or breaking streaming. We got it down to ~250ms overhead with deterministic scanning (regex + normalization, not a second LLM). For streaming responses, you buffer the full response, scan it, then re-chunk — PII never leaks through stream chunks.

The org challenge is that engineering teams will route around any governance that adds friction. The proxy approach works because it's transparent — agents point at "proxy.server.com" instead of api.openai.com and everything else stays the same. Changing env vars, no code changes.