Hacker News new | ask | show | jobs
by jrowley 3888 days ago
De-identification is perfect for a lot of medical research, where you're looking for general trends, and don't need or want specific patient identifying information sitting around on your hard drive. For example, Mimic-III is a great database for ICU data. You still have patient ID numbers for tracking events involving a specific patient but all personal info is de-identified.

http://mimic.mit.edu/about/mimic/

1 comments

It's worth pointing out that anything with patient IDs (even though all personal info is removed) is still not really anonymous "safe" data, and should be treated carefully.

I'm not sure what measures Mimic takes (beyond just stripping demographic info), but a given patient's pattern of healthcare interactions make quite a unique fingerprint -- and some parts of that fingerprint are likely public information for some patients.

E.g., imagine a celebrity who (you can find from the tabloids) was treated at X hospital for a sprained ankle on 2012-07-14, and gave birth to a daughter at Hospital Y on 2014-04-01. If her record -- completely "anonymized" -- is in a data set that lets you search for patients matching these two events... it seems fairly likely you'd be able to narrow it down to only a few candidates, or quite likely an exact match. And then once you have her pseudonym/ID, does the rest of the record reveal anything interesting? An abortion no one knew about (possibly not even her partner)? A venereal disease treatment?

Even the fact that a patient had an appointment at a given clinic is sensitive data -- e.g., seeing an IVF specialist, or oncologist, etc..

It's a tricky field to navigate.