Hacker News new | ask | show | jobs
by macobo 3888 days ago
De-identification has its limits and information can still be learned even from anonymized datasets. An alternative to this is something like Sharemind [1][2] where sound cryptography is used to make secure multi-party computation possible.

[1]: http://sharemind.cyber.ee/

[2]: https://www.youtube.com/watch?v=bAp_aZgX3B0

3 comments

De-identification is perfect for a lot of medical research, where you're looking for general trends, and don't need or want specific patient identifying information sitting around on your hard drive. For example, Mimic-III is a great database for ICU data. You still have patient ID numbers for tracking events involving a specific patient but all personal info is de-identified.

http://mimic.mit.edu/about/mimic/

It's worth pointing out that anything with patient IDs (even though all personal info is removed) is still not really anonymous "safe" data, and should be treated carefully.

I'm not sure what measures Mimic takes (beyond just stripping demographic info), but a given patient's pattern of healthcare interactions make quite a unique fingerprint -- and some parts of that fingerprint are likely public information for some patients.

E.g., imagine a celebrity who (you can find from the tabloids) was treated at X hospital for a sprained ankle on 2012-07-14, and gave birth to a daughter at Hospital Y on 2014-04-01. If her record -- completely "anonymized" -- is in a data set that lets you search for patients matching these two events... it seems fairly likely you'd be able to narrow it down to only a few candidates, or quite likely an exact match. And then once you have her pseudonym/ID, does the rest of the record reveal anything interesting? An abortion no one knew about (possibly not even her partner)? A venereal disease treatment?

Even the fact that a patient had an appointment at a given clinic is sensitive data -- e.g., seeing an IVF specialist, or oncologist, etc..

It's a tricky field to navigate.

Is that the video you meant to link? It concerns space debris.