Hacker News new | ask | show | jobs
by ALittleLight 2456 days ago
Releasing your data should be a requirement for publication. If the original author had wanted to keep this a secret he could've withheld his data and nobody would've been able to correct him, there simply would've been discrepant studies.
1 comments

I see where you're coming from, but would the subjects be comfortable with all their data becoming public?
I think you could, and should anyways, make the data anonymous. Just give every participant a GUID for a participant ID and add a step to purge personally identifiable information. Then you can share records without identity.
That didn’t work for the AOL research several years ago. https://arstechnica.com/tech-policy/2009/09/your-secrets-liv...
Making things like medical records actually anonymous, especially in the face of bad actors, is an unsolved problem.
Anonymizing data is, yes, a difficult problem, but in particular aggregated data can, and has been, reliably anonymized. For example, the problem with this dataset would have been visible in aggregated data (e.g. aggregated by nationality).