Hacker News new | ask | show | jobs
by eesmith 1057 days ago
Just about every accepts that it's reasonable for some government collected information to be kept private. FOIA requests exclude "personnel and medical files and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy". https://www.ecfr.gov/current/title-21/chapter-I/subchapter-A...

In this case it was for "student-level data that detail the demographic information and the performance records over time of California’s 5.8 million students but without any names or identifying information. That data is the gold standard for accurate research. A partnership contract details the department’s commitments and researchers’ responsibilities, including strong assurances they will have security protections in place to protect students’ privacy and anonymity."

The thing about this sort of data is, removing PII from the dataset doesn't make it fully or even sufficiently anonymous. If there's only one Pacific Islander student in the Shasta Union High School District then it's easy to figure out who that is by coming it with other public data.

Quoting https://en.wikipedia.org/wiki/Differential_privacy :

] Statistical organizations have long collected information under a promise of confidentiality that the information provided will be used for statistical purposes, but that the publications will not produce information that can be traced back to a specific individual or establishment. To accomplish this goal, statistical organizations have long suppressed information in their publications. For example, in a table presenting the sales of each business in a town grouped by business category, a cell that has information from only one company might be suppressed, in order to maintain the confidentiality of that company's specific sales.

The clear justification for keeping this information private is that the government won't get sufficiently useful data without this promise. The United States Census Bureau released "confidential" information about draft evaders and Japanese-Americans; if you think they might do that again, perhaps you'll lie about some of the questions.

People who receive this sort of information are required to take special care to maintain the needed level of anonymity.

There's of course no reason why this should be used to muzzle researchers for completely unrelated fields.