Hacker News new | ask | show | jobs
by amichal 3111 days ago
"anonymized" is a statistical measure. What you are doing is making it less likely that someone can be identified not necessarily impossible. I think it would be best if folks were more honest about that. The article mentions finding 7 people in a dataset of 2.9 million. It's obvious that they felt that 7 prominent people was enough to tell the story and they could likely find many more. My question is could they find 0.001%, 1%, 10%, or more? If so with what resources...

Edit: an old an interesting discussion on this: https://news.ycombinator.com/item?id=2942967