|
|
|
|
|
by ThePhysicist
2323 days ago
|
|
We're building an analytics system that is based on differential privacy / randomization of data. It's possible but there are many limitations and caveats, at least if you really care about the privacy and not just apply differential privacy as a PR move. Most systems that implement differential privacy use it for simple aggregation queries, for which it works well. It doesn't work well for more complex queries or high-dimensional data though, at least not if you choose a reasonably secure epsilon: Either the data will not be useful anymore or the individual that the data belongs to won't be reliably protected from identification or inference. After spending three years working on privacy technologies I'm convinced that anonymization of high-dimensional datasets (say more than 1000 bits of information entropy per individual) is simply not possible for information-theoretic reasons, the best we can do for such data is peudonymization or deletion. |
|