Hacker News new | ask | show | jobs
by iraqmtpizza 782 days ago
Encrypting without zipping doesn't leak any information about the content. You can't rule out certain byte sequences (other than by total length) just by looking at the ciphertext length.

If "oui" compresses to two bytes and "non" compresses to one byte, and then you go over them with a stream cipher, which is which:

A: ;

B: *&

2 comments

This has nothing to do with compression. If you use "yes" and "no" instead of "oui" and "non" (which just happen to be three characters each) and you compress "yes" to "T" and "no" to "F" then the uncompressed text will be the leaky one.
It’s an example meant to prove the idea.
Yes, and my example was an example meant to prove the opposite idea. The point is that it is irrelevant whether you compress or not. You can leak information either way.
I leak the length of my phone call and you leak:

1. the length of your phone call; and

2. what language you were speaking; oh and

3. half the words you said

(i.e. pwned)

https://web.archive.org/web/20080901185111/https://technolog...

> you leak [a bunch of stuff]

How? Remember, the uncompressed text gets encrypted too.

It's in the article if you would bother to read it LOL. "simply measuring the size of packets without decoding them can identify whole words and phrases with a high rate of accuracy . . . [the researchers] can search for chosen phrases within the encrypted data"
Cryptography noob here: I'm confused by "Encrypting without zipping doesn't leak any information about the content." Logically speaking, if we compress first and therefore "the content" will now refer to "the zipped content", doesn't this mean we still can't get any useful information?
Not OP, but 'zipping and encrypting' one thing (a file for example) does not leak information by itself. The problem comes when an adversary is able to see the length of your encrypted data, and then can see how that length changes over time - especially if the attacker can control part of the input fed to the compressor.

So if you compressed the string "Bob likes yams" and I could convince you to append a string to it and compress again, then I could see how much the compressed length changed.

If the string I gave you was something already in your data then the string would compress more than it would if the string I gave you was not already in your data - "Bob likes yams and potatoes" will be larger than "Bob likes yams likes Bob".

If the only thing I can see about your data is the length and how it changes under compression - and I can get you to compress that along with data that I hand to you - then eventually I can learn the secret parts of your data.

Encryption generally leaks the size of the plaintext.

This is true in both the compressed and non-compressed case. However with compression the size of the plaintext depends on the contents, so the leak of the size can matter more than when not using compression.

Even without compression this can matter sometimes. Imagine compressing "yes" vs "no".

> Encryption generally leaks the size of the plaintext.

Ah, I see. Naïvely, this seems like a really bad thing for an encryption algorithm to do—is there no way around it? Like, why is encryption different from hashing in this regard?

There are methods, but they are generally very inefficient bandwidth wise in the general case. The general approach is to add extra text (pad) so that all messages are a fixed size (or e.g. some power of 2). The higher the fixed size is, the less information is leaked and the less efficient it is. E.g. if you pad to 64mb but need to transmit a 1mb message, that is 63mb of extra data to transmit.

Part of the problem (afaik) is we lack good math tools to analyze the trade offs of different padding size vs how much extra privacy they provide. This makes it hard to reason about how much padding is "enough".

Another approach is adding a random amount of padding. This can be defeated if you can force the victim to resend messages (which you then average out the size of).

Hashing is different because you don't have to reconstruct the message from the hash. With encryption the recipient needs to decrypt the message eventually and get the original back. However there is no way to transmit (a maximally compressed) message in less space then it takes up.

There are special cases where this doesn't apply e.g. if you have a fixed transmission schedule where you send a sprcific number of bytes on a specific agreed upon schedule.