Hacker News new | ask | show | jobs
by Finnoid 72 days ago
Interesting! I notice you mention phone numbers but not names. Can PII-hound also detect things like first and last names in the data? I know that might not be the use case you’re primarily solving for but I’m finding as organizations use AI to process data it’s becoming more important to be able to scrub it from including any PII that might involve user or customer names. I’d love a lightweight CLI tool to do that for me.
1 comments

That is a good question. No, we don't do anything with names at the moment. Names are hard because they don't follow a pattern. The next version will flag columns named first_name, last_name, fullname, or customer_name. That should be published later today.

Beyond that, pii-hound supports custom rules. A user could create some rules to match known names if they wanted.

I am open to ideas of other ways to close that gap.

I don’t know if this is viable but I wonder if you could package a small open source LLM and feed the data through it in chunks to scrub names. I’m sure it would add to the processing time and bunch other issues. But just a thought.