Hacker News new | ask | show | jobs
by joshuamorton 3071 days ago
Disagree. You can anonymize and aggregate data such that any data that could deanonymize would also allow you to fully reconstruct what you're looking at. Then you aren't adding any information.

As an example, a list of Google searches, aggregated at the minute level, is a useful dataset, but it won't tell you anything about my search history unless you already have my search history, in which case you already knew the answer.

1 comments

To take your example - assume I target you personally, and beyond the list of Google searches, I managed to get hold of the list of times (with minute-or-better precision) you made requests to Google Search (say, I hacked/subpoenaed your ISP). Taken together and if large enough, the two datasets would allow me to build a statistical profile of your possible interests - even though in the original dataset you're bucketed together with lots of people, each time you do the search (second dataset) you're bucketed with different people.

Gaining access to other data - like e.g. your country of residence + aggregated popularity of search terms for each country - would let me refine your statistical profile further.

That would potentially work, but I expect that it would require a dataset of size larger than the average lifespan of a person (2.4 million google searches per minute).