Hacker News new | ask | show | jobs
by mumblemumble 2406 days ago
More to the point, I'm not even sure that, in principle, it is possible to truly anonymize this data without spoiling the data science easter egg hunt. And the easter egg hunt is arguably the entire point of this kind of data collection.

You can do something like k-anonymizing the data and then destroying the original, personally identifiable data. But k-anonymity has its limits, too.

Every other strategy I know of assumes that it's OK to keep a private copy of the original data, which works well if we're talking about scenario such as a source that needs to keep the raw data (like a health care provider) providing the data with a semi-trustworthy external party such as a health researcher. But it doesn't address what I'm guessing is the main concern here, which is that, even if you accept for the sake of argument that Google currently has no intention to do gross things with the data, they can't make any promises that will hold indefinitely. It's a long-lived organization that whose policies might change with any change in leadership, market, or even political conditions, so any promises they might make are simply meaningless in the long run. As they would be with any organization, regardless of the presence or absence of any present-day warm fuzzy feelings.

1 comments

Basically the data, to be useful, has to be capable of uncovering correlations with all sorts of demographics. Those clues can de-anonymize it. If we take your name out of a data frame (so that we can call it "anonymous") but leave all sorts of other properties (those being the payload useful to data science), you may be nevertheless identifiable from that combination of properties, together with other info known about you from other tracking sources.