Hacker News new | ask | show | jobs
Ask HN: What are some clever ways data in public domain has been de-anonymized?
3 points by danielhughes 4125 days ago
3 comments

Semantic analysis of the Federalist Papers[0] comes to mind. It originally was not known which individual author wrote which paper, but stylometric analysis (i.e. word-counting and matching word frequency distributions of the unlabelled papers against those of labelled papers (in which the author was known)) made it reasonably straight-forward to identify the original authors.

[0] A set of historical papers of great political importance. http://en.wikipedia.org/wiki/The_Federalist_Papers

I can't recall the article, but there was a case where public data was de-anonymized based on DOB and zipcodes, and it was incredibly successful in a given state.