|
|
|
|
|
by jrowley
3888 days ago
|
|
De-identification is perfect for a lot of medical research, where you're looking for general trends, and don't need or want specific patient identifying information sitting around on your hard drive. For example, Mimic-III is a great database for ICU data. You still have patient ID numbers for tracking events involving a specific patient but all personal info is de-identified. http://mimic.mit.edu/about/mimic/ |
|
I'm not sure what measures Mimic takes (beyond just stripping demographic info), but a given patient's pattern of healthcare interactions make quite a unique fingerprint -- and some parts of that fingerprint are likely public information for some patients.
E.g., imagine a celebrity who (you can find from the tabloids) was treated at X hospital for a sprained ankle on 2012-07-14, and gave birth to a daughter at Hospital Y on 2014-04-01. If her record -- completely "anonymized" -- is in a data set that lets you search for patients matching these two events... it seems fairly likely you'd be able to narrow it down to only a few candidates, or quite likely an exact match. And then once you have her pseudonym/ID, does the rest of the record reveal anything interesting? An abortion no one knew about (possibly not even her partner)? A venereal disease treatment?
Even the fact that a patient had an appointment at a given clinic is sensitive data -- e.g., seeing an IVF specialist, or oncologist, etc..
It's a tricky field to navigate.