| Anyone who believes they can anonymize data automatically will be very disappointed... There are so many ways in which data can point to individuals, you'd need to process every datapoint with a lot of care and investigation. For example, rare medical conditions can be a good identification tool if the adversary knows the relation between such a condition and a person. How would an automatic tool know if a medical condition is rare enough? How will it know if such information is already available elsewhere? Information may be transferred as images, or as audio. What if database simply stores these as blobs and only the application knows what format is used inside the blob? Or, even if the format is known, in format s.a. DICOM where it's hard to tell if the information is significant or not. You can often recognize MRI machines due to various features of an image they take, eg. there might be some artifacts that would be found in every image. DICOMs usually have information s.a. date the image was taken, beside patient's name. But, connecting the date and a machine one may be able to infer which patient was pictured, if they also know that the patient paid for the cab ride around that time. Or, even simpler: sometimes there may be text in DICOM images identifying patients in some way. Or, in a situation like my office: there's one woman and 30 men working there. Surprisingly, gender becomes a very precise tool at identifying people. |
What obfuscation mainly does is remove the PII that neither side wants to handle before the data gets transferred over so the data is "safe" and the receiver of the data no longer has the burden of stewardship over PII.
Contrary to a lot of handwringing on the internet, almost everyone that handles your data couldn't care less about you as a person. Their overwhelming interest in you is as a bag of attributes that they can statistically correlate with other bags of attributes. It's a relief for them if they can scrub all the PII from their databases while retaining all of the other bag of attribute qualities that they care about. Of course, the few entities that do care about deanonymization are the ones that make this entire process so difficult.