Hacker News new | ask | show | jobs
by coldcog 3663 days ago
It's important to note that either:

- the things you are encrypting are not numbers, or

- your scheme is not as secure as you would like it to be,

where the second point means that necessarily there is some information leakage: that is, more than just their order can be derived from ciphertexts. This is not necessarily bad; the questions is what it is that is leaked, and if that is harmful to your situation.

The website links to two papers, one for each of the above categories, of which only the second one is sufficiently efficient to be of any use. Apart from the order of two ciphertexts, it reveals the index of the first bit at which the ciphertexts differ.

1 comments

> the things you are encrypting are not numbers

I do not believe there's any data that cannot be represented as a number, so this point confuses me.

That's a good point, indeed any sequence of bytes can be interpreted as a number. But it may also encode an element of a set that has an ordering differing from that of the numbers. (For example, you could order, let's say, the set of all cat pictures by how cute they are.) That's what makes the difference.
So if I understand properly, the real issue is whether knowing the ordering and the data type would allow one to infer something about the contents?

I also wonder if there's a way to make that statement of when this causes something to be revealed more mathematically precise... maybe it's something that could even be automatically detected by someone with access to the original data?

You could talk about whether something is unique. For example, Social Security Numbers, telephone numbers, and IP addresses are designed to be unique. By contrast, given names, ages, or favorite colors are not unique in isolation.

Unfortunately, that turned out not to be a hard-and-fast distinction because things that are not unique in isolation are often unique in combination.

https://en.wikipedia.org/wiki/Quasi-identifier

The Netflix deanonymization paper discusses how things (how much you like a movie) that are very non-unique in isolation can be very unique when you have an extremely large number of them. Arvind Narayanan (one of the authors) has given a few discussions of the problem of dimensionality for privacy; one way of thinking of it is that there's an unfathomably large amount of volume in a very high-dimensional space, so there's an extremely large amount of opportunity for points in it to be very far away from each other, even if there's nothing especially "atypical" about the individual points.

This is closely related to the "curse of dimensionality"

https://en.wikipedia.org/wiki/Curse_of_dimensionality

although the curse is often stated from the analyst's perspective when hoping to find patterns, whereas the phenomenon Narayanan is describing is more from the perspective of the individual whose data are in a database and who hopes to appear similar to other individuals in order to retain anonymity, yet turns out to be very distinctive merely because of the number of dimensions.

> you could order, let's say, the set of all cat pictures by how cute they are

This would be a very, very popular web site.