Hacker News new | ask | show | jobs
by numpad0 1602 days ago
I think by far the largest contributor that coined mojibake was E-mail MTA. Some E-mail implementations assumed 7-bit ASCII for all text and dropped MSB on 8-bit SJIS/Unicode/etc, ending up as corrupt text at the receiving end. Next up was texts written in EUC(Extended UNIX Code)-JP probably by someone either running a real Unix(likely a Solaris) or early GNU/Linux, and floppies from a classic MacOS computer. Those must have defined it and various edge cases on web like header-encoding mismatch popularized it.

"Zhonghua fonts" issue is not necessarily linked to encoding, it's an issue about assuming or guessing locales - that has to be solved by adding a language identifier or by ending han unification.