|
|
|
|
|
by jensbontinck
110 days ago
|
|
Yes, but most are doing it wrong. The common approaches I've seen:
1. Policy-only ("don't paste sensitive data into ChatGPT") — doesn't work, people do it anyway
2. Network-level blocking (block api.openai.com) — kills legitimate use cases
3. DLP on the network layer — can't inspect the semantic content, just sees HTTPS traffic What actually works is a proxy-based approach: all LLM API traffic routes through a governance layer that scans the content of every request. PII gets tokenized before reaching the model (SSN
becomes [SSN_TOKEN_1], model processes it fine, token gets restored on the way back). Secrets get blocked. Data classification determines which model can be used — RESTRICTED data stays on-prem or goes to a private deployment, PUBLIC data can go to any provider. The technical challenge is doing this without killing latency or breaking streaming. We got it down to ~250ms overhead with deterministic scanning (regex + normalization, not a second LLM). For streaming responses, you buffer the full response, scan it, then re-chunk — PII never leaks through stream chunks. The org challenge is that engineering teams will route around any governance that adds friction. The proxy approach works because it's transparent — agents point at "proxy.server.com" instead of api.openai.com and everything else stays the same. Changing env vars, no code changes. |
|