Hacker News new | ask | show | jobs
by tialaramex 2195 days ago
I really mean as simple as checking whether a byte is 0x41 to 0x5A (upper case Latin letters) 0x61 to 0x7A (lower case) or 0x30 to 0x39 (numerals) or 0x20 (space)

With XOR you've got 8 bits to correct, let's look at each in turn to see why my cheesy approach is enough on a chunk of likely ASCII or ASCII-compatible text:

0x80 the most significant bit - If this is wrong nothing will match because 0x80 isn't set in any of our matches

0x40 - If this is wrong digit zero and space won't match, upper case letters won't match, and lower case 'a' through 'o' plus 'z' won't match.

0x20 - If this is wrong space and numerals don't match. It also "flips" the case which means if you're just fishing for the plaintext and there were no spaces or numerals it's still easily readable.

0x10 - If this is wrong all numerals except zero don't match, and letters 'K' through 'O' in either case don't match

0x08 - If this is wrong space doesn't match. Digits 2 through 7, letters 'S' through 'W' in either case don't match

0x04 - If this is wrong space doesn't match. Digits 8 and 9 don't match. Letters 'X' through 'Z' in either case don't match.

0x02 - If this is wrong space doesn't match. Digits 8 and 9 again, and this time only the letter 'Y' (either case) doesn't match.

0x01 least significant - Only space doesn't match.

You can see that space is really doing a lot of work for you here, but on the other hand humans really like space. If the text you thought would be French poetry is actually Python it's likely still full of spaces even though the letter frequencies are way off.