Hacker News new | ask | show | jobs
by Erlich_Bachman 2406 days ago
Because it is -never - going to be anonymized. They are going to call it anonymized, and then publicly apologize for it not being so a couple of times, while keeping business as usual. There is just no incentive for them to anonymize it.
3 comments

More to the point, I'm not even sure that, in principle, it is possible to truly anonymize this data without spoiling the data science easter egg hunt. And the easter egg hunt is arguably the entire point of this kind of data collection.

You can do something like k-anonymizing the data and then destroying the original, personally identifiable data. But k-anonymity has its limits, too.

Every other strategy I know of assumes that it's OK to keep a private copy of the original data, which works well if we're talking about scenario such as a source that needs to keep the raw data (like a health care provider) providing the data with a semi-trustworthy external party such as a health researcher. But it doesn't address what I'm guessing is the main concern here, which is that, even if you accept for the sake of argument that Google currently has no intention to do gross things with the data, they can't make any promises that will hold indefinitely. It's a long-lived organization that whose policies might change with any change in leadership, market, or even political conditions, so any promises they might make are simply meaningless in the long run. As they would be with any organization, regardless of the presence or absence of any present-day warm fuzzy feelings.

Basically the data, to be useful, has to be capable of uncovering correlations with all sorts of demographics. Those clues can de-anonymize it. If we take your name out of a data frame (so that we can call it "anonymous") but leave all sorts of other properties (those being the payload useful to data science), you may be nevertheless identifiable from that combination of properties, together with other info known about you from other tracking sources.
Yeah, I totally believe it can be anonymized, but when they attach geo data, age data, etc you can be picked out of the crowd with just a bit of analysis. Plus I don't think most of them actually anonymize it. They just say they do. There's no one auditing them until a court order happens.
Every organization and project I've worked for, for companies much smaller than Google, has done its best to comply with eliminating PII.

https://www.gsa.gov/reference/gsa-privacy-program/rules-and-...