|
|
|
|
|
by ThePhysicist
3103 days ago
|
|
Personally I wouldn't be that pessimistic about data anonymization. It's entirely possible to robustly anonymize low-dimensional data sets and restrict the information gain of an attacker to a given value even when he/she has information about all non-sensitive attributes in the data set. When using e.g. k-anonymity (with additional l-diversity or better and t-closeness criteria) the resulting data is very robust against attacks, given you correctly specify your sensitive attributes. Of course there are more things to keep in mind, e.g. when repeatedly anonymizing different versions of the same data set (as this can cause data leakage). |
|
1. I've never seen a formal definition of security that k-anon supposedly satisfies. While I personally really like formal guarantees, maybe one might argue this wouldn't be so bad absent concrete problems with the definition. Which leads us to...
2. K-anon doesn't compose. The JOIN of 2 databases, each k anonymized, can be 1-anonymous (i.e., no anonymity), no matter what k is.
3. The distinction between quasi-identifiers and sensitive attributes (central to the whole framework) is more than meaningless: is misleading. Every sensitive attributes is a quasi-identifier given the right auxiliary datasets. Using k anon essentially requires one to determine a priori which additional datasets will be used when attacking the k anonymized dataset.
4. My understanding of modified versions (diversity, closeness, etc) is less developed, but I believe they suffer similar weaknesses. The weaknesses are obscured by the additional definitional complexity.
(Edit: typos and autocorrect)