| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by EthanHeilman 2389 days ago

I agree recovering CJK Unified Ideographs encodings would be far harder than a phonetic alphabet, however a few things could make not as hard as it seems. The decoder has access to a text in both the future format and UTF-8. A text might mix phonetic words and ideographs as Japanese sometimes does today. The phonetic words would provide clues as to the ideographic characters.

Code breakers have decoded ciphertexts which used a code such that each word was replaced with a number. To make it even harder common words would be replaced by more than one numbers to defeat common frequency analysis techniques. This was done often with pen and paper.

Yuri Knorozov managed to decipher the Mayan script. That was a significantly harder task than recovering UTF-8 mappings because he has very little to work with on the source language (he did have somethings).

1 comments

tripzilch 2387 days ago

Exactly. You shouldn't underestimate the tremendous amount of work has been put into deciphering actual ancient languages using advanced techniques and minor contextual clues. Compared to that, deciphering most common UTF8 data would be relatively simple, meaning it could be done by a single person with some reverse engineering skills.

link