|
|
|
|
|
by antpls
2484 days ago
|
|
After reading all your links, I'm still not sure why or where Differential Privacy is needed. 1) How could aggregated data (means, average, min max) be used by attackers? Aren't aggregated data already private? For example, the Google postgres extension returns aggregated data, why is DP required here? 2) In the case of sharing entire databases, if all the PII are removed, why does it matter that we can match two records from two databases? Yes we can do correlation between 2 databases, but if PII were not gathered and stored at all in any database, there would be no privacy issue in the first place. |
|
1) Note that the "min/max" example trivially leaks individual information: for example, releasing the max salary of employees of a company leaks the salary of the CEO. More generally, there have been numerous attacks on privacy notions purely based on aggregate data. One of my favorite is this one: https://blog.acolyer.org/2017/05/15/trajectory-recovery-from...
2) Typically, PII is not the only thing that can be used to reidentify someone, and matching records from different databases can sometimes infer sensitive information about people. One example: https://www.cs.cornell.edu/~shmat/netflix-faq.html