Hacker News new | ask | show | jobs
by sqrt17 1637 days ago
At our (fairly large) company, you can query by team (and maybe job role) but it will hide responses where the sample size is smaller than a set number (I think 8 or 10).

So yes it can be done but people have to actually care about it.

The cautionary tale about k-anonymity (from Aaron Schwartz's book I think) is when the behavior of aggregates is also something that should be kept privates - the example was that the morning run of an army base in a foreign country was revealed because enough people did this with their smartwatches on that it formed a neat cluster.

3 comments

Isn’t location data particularly easy to de-anonymize? I remember reading some research that because people tend to be so consistent with their location, you could deanonymize most people in a dataset with 3 random location samples through the day
This is likely the paper you are referring to:

https://www.nature.com/articles/srep01376

It states that 95% of people can be identified from just 4 location samples.

More details on the Strava Run incident: https://www.bbc.com/news/technology-42853072
In Germany (I think all of the EU), a dataset can only be published if the sample size is at least >=7.
Just fwiw sample size isn’t a robust defence against this kind of attack. Check out Differential Privacy.