Hacker News new | ask | show | jobs
by boron1006 2512 days ago
For context, this was when ISPs were planning on selling data, and someone was collecting donations saying they'd reidentify senators internet history. I said that people shouldn't donate to them, because it wasn't even clear what the ISPs would release. Their point was it doesn't matter what the ISPs release, they could reidentify anyone with deep learning.

> And it's not intuitively obvious which combinations of values allow you to recover which other ones.

I think it's pretty intuitive that Zip Code and DOB are identifiers. That's why they count as such in HIPAA, and are used to demonstrate identity by governments, credit cards, etc.

Personally I think this stuff just poisons the well when it comes to discussions of privacy. I think the goal is to remove the expectation of anonymity by claiming that it's never possible.

2 comments

> I think it's pretty intuitive that Zip Code and DOB are identifiers.

It's great that you think that, but basically no company uses that definition. Most company privacy policies don't consider combinations of information when making this determination. E.g. your billing address might be personal information, but your zip code by itself might not. Similarly, IP address (with or without last octet), wifi SSID, location data, browsing history (or attributes derived from browsing history), and so on. Each individual piece of data isn't enough to personally identify you, so the privacy policy often doesn't have to be applied to it.

E.g. after reading the Google privacy policy[0], can you tell what protections your zip code and DOB have? Will Google treat them as personal information or personal identifiers or not?

0: https://policies.google.com/privacy?hl=en-US

> I think it's pretty intuitive that Zip Code and DOB are identifiers.

Sure, but what about job title? What about job title when someone's job title is "mayor" or "fire chief" and it's possible to deduce from other information what city they're in? Or someone's job title is just "Governor of the State of California"?

Any collection of random independent characteristics become uniquely identifying once you have enough of them. Then all the attackers need is another database with the same characteristics that also includes names or other identifiers, and you can associate the missing fields from one database with the other.

> Personally I think this stuff just poisons the well when it comes to discussions of privacy. I think the goal is to remove the expectation of anonymity by claiming that it's never possible.

It's not that it's never possible, it's that it's only possible if we don't feed these centralized databases enough information to uniquely identify people. So we need to stop doing that.