Hacker News new | ask | show | jobs
by dTal 4602 days ago
Some property of the data that makes it easily identifiable? Such as someone's entire medical history, including age, sex, number of children, and "race"? Just wiping someone's name is not going to "anonymize" that kind of data.

There's something deeper here - Larry Page is thinking in terms of "big data" - "wouldn't it be nice if we could run all kinds of multivariate analyses on everything about everyone?" - but it's fundamentally impossible to do that while preserving anonymity. The same information that might reveal interesting correlations is vulnerable to correlation attacks. Say you wanted to know how nutrition affected immune system response. You'd look at things like height, weight, diet, frequency of minor illness, and rough location (city level, to control illness rate against those around you). To be of any use, the "anonymized" database is going to have to have all those variables correlated per person, which means if I know you're 6'1", eat a lot of bananas, live in Silicon Valley and had a cold last year, I can potentially look you up and find out other things about you.

It's worth remembering that whenever a high-level Googler talks about the social/medical potential of large-scale data analysis like that, that it is inherently hostile to privacy.