|
|
|
|
|
by schoen
3663 days ago
|
|
You could talk about whether something is unique. For example, Social Security Numbers, telephone numbers, and IP addresses are designed to be unique. By contrast, given names, ages, or favorite colors are not unique in isolation. Unfortunately, that turned out not to be a hard-and-fast distinction because things that are not unique in isolation are often unique in combination. https://en.wikipedia.org/wiki/Quasi-identifier The Netflix deanonymization paper discusses how things (how much you like a movie) that are very non-unique in isolation can be very unique when you have an extremely large number of them. Arvind Narayanan (one of the authors) has given a few discussions of the problem of dimensionality for privacy; one way of thinking of it is that there's an unfathomably large amount of volume in a very high-dimensional space, so there's an extremely large amount of opportunity for points in it to be very far away from each other, even if there's nothing especially "atypical" about the individual points. This is closely related to the "curse of dimensionality" https://en.wikipedia.org/wiki/Curse_of_dimensionality although the curse is often stated from the analyst's perspective when hoping to find patterns, whereas the phenomenon Narayanan is describing is more from the perspective of the individual whose data are in a database and who hopes to appear similar to other individuals in order to retain anonymity, yet turns out to be very distinctive merely because of the number of dimensions. |
|