Hacker News new | ask | show | jobs
by lelf 4617 days ago
한글 is in basic plane. It's U+D55C U+AE00
2 comments

I was considering the fact that when it adds 0x8000 or whatever it's doing it's hitting 0x1.... codepoints and doing weird things with those because of the encoding. Here's a trace of 한글 through this 'rot8000', though:

한글: 0xd55c 0xae00 똼軠: 0xb63c 0x8ee0 霜激: 0x971c 0x6fc0 矼傠: 0x77fc 0x50a0 壜ㆀ: 0x58dc 0x3180 㦼በ: 0x39bc 0x1260 ᪜ㆀ: 0x1a9c 0x3180 㦼በ: (repeating)

So... yeah. Weirdness all around. Might have better luck doing this with some carefully crafted xor pad for each codepoint so that it's likely to hit a printable character but impossible to hit a character in the 0xD800..0xDFFF range (and similar ranges)... trying to "wrap" in unicode would require reinterpreting the codepoints to some continuous numeric representation.

Something might be off in the math -- there are some work-arounds to skip control characters that might be off when starting in this range