|
|
|
|
|
by jjgreen
792 days ago
|
|
How many Caucasian males born between 12/01/97 and 12/01/97 in MSOA region 12523 with <some not too common condition I know you have> have <embarrassing condition> [1] My point is that aggregated data becomes an indicator function on an individual if you allow arbitrary queries; then you can extract all information on that individual by exhaustion: For each <embarrassing condition> run [1] |
|
The leader of the OpenSAFELY project produced the Goldacre review for the UK Government, which discusses this risk in detail. Here's a quote from the executive summary:
That's why there are multiple layers of security within the OpenSAFELY system. Other layers include: multiple authentication required from separate orgs to view outputs, various limits on the outputs you can view, full public audit trail of all code run, and all outputs viewed, separation between the systems used to run code and to view outputs (so compromise of one system doesn't affect the other). And finally, you cannot release an output from a secure backend until it has been checked for potential disclosivity by 2 ONS[2] trained reviewers.Like all computer system, you cannot remove all risk from the system. But you can reduce it significantly, and make all activity public and auditable. And it's a damn sight better than current practice for medical data research, tbh. We try as much as possible to provide data-minimisation, in that you only have the minimal set of data you need answer your question. Whereas, in the UK at least, most of the time, researchers just get given access to everything.
[1]https://www.gov.uk/government/publications/better-broader-sa...
[2] Office of National Statistics, UK equivalent of the US Census Bureau
EDIT: fixed a reference