Hacker News new | ask | show | jobs
by LatteLazy 1849 days ago
People should note that this data cannot really be anonymised. This is because you only need a post code, date of birth and sex to determine who the majority of people are.

"A 2000 study found that 87 percent of the U.S. population can be identified using a combination of their gender, birthdate and zip code."

https://en.wikipedia.org/wiki/Data_re-identification

UK postcodes are more specific I believe.

3 comments

It seems the NHS will still be able to identify patients. FTA:

"Data that directly identifies patients will be replaced with unique codes in the new data set, but the NHS will hold the keys to unlock the codes “in certain circumstances, and where there is a valid legal reason”, according to its website. "

What they don't seem to cover is what they think directly identifying data means.

Does anyone actually know what that means? I wouldn't know from a medical record how much data would need to be removed to make it anonymous. It likely depends on the record. And there are different answers that can both be right (so which are they using?).

Why not remove postcodes?
Because you can do something very similar with combos of other fields in the data. That's the problem here: no one knows what combination of details deannonimise a record. So what do you remove, all of them? So the claim it is anonymous is BS. And once the records are out there, there is no way to get them back...

Edit: as an example, I was in a car accident as a 17 year old and broke my jaw. If you Google my name and the location you'll see the news article. I was the only person treated at the local hospital for a broken jaw that day. You just deannonimised me.

Or go again: I'm the only person from my town who went to my university in the year I went. Just look for someone treated in <home area> term time and <uni medical clinic> term time 2002-2008.

For example, to book the NHS vaccine you only need a person postcode, first and last name, and date of birth. All this information is easily available, for example from companies house.