Hacker News new | ask | show | jobs
by EA-3167 456 days ago
To what extent and using what method is it "de-identified"? Plenty of such schemes are very easy to circumvent, especially with a large enough pool of data. Given the nature of genetics in particular positively identifying a single case can be used to unmask whole families. In particular depending on the anonymization this would be a task suited to 'AI' very well.
2 comments

https://www.23andme.com/about/individual-data-consent

Basically, if you imagine this as a table of "user's name, date of birth, and address" keys mapping to genomic and other data, the key was replaced with a random identifier that could not be trivially joined to recover the user name, date of birth, and address.

These systems are not robust against motivated and capitalized adversaries.

I can go to a data broker and purchase access to de-identified EMR data for most of the U.S. population. There are much more useful de-identified datasets around than ours, if someone is motivated to try to re-identify those datasets. That data is all bought and sold without anyone's consent and this is all fine under HIPAA.
I wasn't trying to convince anybody otherwise. I think the noise about 23&Me's data to be pretty uninteresting. I published my own genome (through PGP) for anybody to download, and I know that people have identified me from my post https://news.ycombinator.com/item?id=7641201 and other comments.
That's more or less what I expected. Ah well, the odds that this becomes something of significance to most people seems remote, but either way you can't unring the bell.
Here "de-identified" means stripped of PII (name, address, phone number, email, etc). You are correct that genetic information is intrinsically identifiABLE (in the sense that it is stable and uniquely distinguishing for individuals). When we've shared individual-level data with a partner, it was with consent of the participants involved, and under a contract that prohibits re-identification.