| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sqrt17 1637 days ago

At our (fairly large) company, you can query by team (and maybe job role) but it will hide responses where the sample size is smaller than a set number (I think 8 or 10).

So yes it can be done but people have to actually care about it.

The cautionary tale about k-anonymity (from Aaron Schwartz's book I think) is when the behavior of aggregates is also something that should be kept privates - the example was that the morning run of an army base in a foreign country was revealed because enough people did this with their smartwatches on that it formed a neat cluster.

3 comments

pfraze 1637 days ago

Isn’t location data particularly easy to de-anonymize? I remember reading some research that because people tend to be so consistent with their location, you could deanonymize most people in a dataset with 3 random location samples through the day

link

morganherlocker 1637 days ago

This is likely the paper you are referring to:

https://www.nature.com/articles/srep01376

It states that 95% of people can be identified from just 4 location samples.

link

ocdtrekkie 1637 days ago

More details on the Strava Run incident: https://www.bbc.com/news/technology-42853072

link

teraku 1637 days ago

In Germany (I think all of the EU), a dataset can only be published if the sample size is at least >=7.

link

williamtrask 1637 days ago

Just fwiw sample size isn’t a robust defence against this kind of attack. Check out Differential Privacy.

link