Hacker News new | ask | show | jobs
by bob1029 174 days ago
This is interesting work. My approach so far has been to keep the PII as far away as possible from the LLM. Right now it's salted hashes if it's anything at all.

I would be tempted to try a pseudonymous approach where inbound PII is mapped to a set of consistent, "known good" fake identities as we transition in and out of the AI layer.

The key with PII is to avoid combining factors over time that produce a strong signal. This is a wide spectrum. Some scenarios will be slightly identifying just because they are rare. Zip+gender isn't a very strong signal. Zip+DOB+gender uniquely identifies a large number of people. You don't need to screw up with an email address or tax id. Account balance over time might eventually be sufficient to target one person.