Hacker News new | ask | show | jobs
by noselfrighteous 3872 days ago
So I read the wiki on the One-time pad and there's something I'm a little stuck on. There's a statement (paraphrasing) that the OTP is immune to cryptanalysis (brute force) because any given key translates to all possible plain-text, and the viable words all have a-priori the same likelihood.

The thing I'm stuck on though, isn't it still possible to do semantic analysis on the various permutations. Basically reading permutations for cogent statements? So do some sort of a-posteriori analysis

Infeasible for a human to do, but assuming one could construct a significantly advanced parser (non-trivial of course), wouldn't it be possible to brute force still? What am I missing?

5 comments

No. What you described will work for a simple substitution cypher, but not for a one time pad. A one time pad is the same length as the message, and permutates every letter independently. Trying all keys will yield every possible plaintext. For example the phrase:

"The swallow flies at midnight"

May (with a one time pad) be encrypted into

"WD4oXOl8yO0QtD4sOf7ip0P7ScIia"

(which, incidentally, is indistinguishable from random noise)

If you just bruteforced that by xor'ing every character with every other possible character you could derive every possible message of that length, such as:

"garfield hate lasagna someday"

"men are cats why even bother?"

"pocket knives go to space yay"

etc ad infinitum

No measure of semantic analysis will help you here!

Well-known caveat for people who are familiar with encryption, but it's worth calling out explicitly here:

If you use the same one time pad to encode two or more different messages, then all the sorts of attack proposed here become plausible again.

The security provided by a one time pad relies entirely on the fact that it is only ever used once.

I'd like to add this scenario actually happened during the Cold War. Soviets were reusing one time pads and the US army decrypted some of the messages, among other things this lead to discovery of Soviet spies targeting the US nuclear weapon program https://en.wikipedia.org/wiki/Venona_project
https://www.youtube.com/watch?v=yxx3Bkmv3ck

Computerphile recently showed how this was done.

Is it really still a "one time pad" if you used it multiple times?
Easy to tell the right sentence because it's the only one capitalized correctly! /s
Ahh yep, got it.
I am out of my element here, but my understanding is that since the key is equal in length to the message, there is no way for you to know whether you are simply seeing a pattern in the key or a pattern in the message.

Imagine a one time pad made for encoding numbers that used a "MOD 10" operation on each digit.

Then imagine the key is:

    6926560279774
And the message is:

    0000000000000
The output is:

    6926560279774
Alternative messages:

    1234567890123 -> 7150027069897
    1111111111111 -> 7037671370885
In all cases, the patterns that you can discern may be from my message and may be from the key. As an analyst, you can't tell.

If this were English letters rather than numbers, and you know 'e' is very common, you still can't get anywhere because each 'e' is encoded with a unique character from the key.

This is a good description, but to add on to it: If there is a pattern in the plaintext, it does not increase the probability that there is a pattern in the ciphertext. It is true that there may be patterns in the ciphertext, but they give you no information about if there is a pattern in the plaintext.
The key is the same size as the message. Each letter translates the corresponding letter and no others. You could make a key to translate the message to anything with the same number of letters.
Well no. What you are describing is basically searchig through all permissible permutations in a given search space, i.e. a thousand monkeys with typewriters. Fron time to time the system will produce something that is not gibberish, but there is no way of knowing if it is related to the true message at all.
The message has an equally probably of decoding to ANY message. You essentially have no information to work with.