Hacker News new | ask | show | jobs
by cantor_S_drug 304 days ago
Tangentially related : It is possible to deanonymize users from kaggle dataset or netflix competition.

https://medium.com/@EmiLabsTech/data-privacy-the-netflix-pri...

Compared to the example of the medical records, Netflix had been very careful not to add any data that could identify a user, like zip-code, birthdate, and of course name, personal IDs, etc. Nevertheless, only a couple of weeks after the release, another PhD student, Arvind Narayanan, announced that they (together with his advisor Vitaly Shmatikov), had been able to connect many of the unique IDs in the Netflix dataset to real people, by cross referencing another publicly available dataset: the movie ratings in the IMDB site, where many users post publicly with their own names.

https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf

https://courses.csail.mit.edu/6.857/2018/project/Archie-Gers...