Hacker News new | ask | show | jobs
by vlthr 2603 days ago
There is definitely some distinction to be made between Google's and Facebook's approaches to privacy, but data anonymization is a more of a PR technique than a privacy one. It can be done if you accept that you may lose nearly all of the valuable structure in the data, but that is always going to be a hard sell.

Recently there has been a lot of discussion in Sweden (maybe elsewhere too) about anonymized mobile phone location data that is sold online. In that case "data anonymization" usually meant swapping out personal identifiers for some token. If that was the only information you had you'd be more or less fine, but what if you have access to some correlated side-channel information that IS personally linked? In the location data example, just combining with publicly available home address data is enough to de-anonymize nearly every person in the dataset (i.e. where does anonymous token X go every night and leave every morning?).

This problem emerges very quickly as soon as you start linking together multiple pieces of anonymized data (or just sampling the data with high enough resolution). The only real virtue of data anonymization is that it prevents casual snooping by the people that work with the data.

1 comments

Essentially it is not anonymized but pseudonymous. No identifier and no timing data would be much closer to anonymous.

True bounded anonymous data is done by aggregation and/or mixing.