Hacker News new | ask | show | jobs
by robjan 2294 days ago
From the perspective of a person who uses an alphabetical language, such as English, sure Unicode can be "done". But if your language is based on ideograms, like Chinese, then it'll never be "done". As words are created they need to be encoded.
1 comments

Again, that's great and I understand that (I've studied Japanese), but that's only part of the new version. They're not adding pictures of "mousetrap" and "olives" and "toilet plunger" because any existing language needs to write these.

Furthermore, I'm really starting to question the way CJK is encoded. We don't make every English word a separate codepoint. 97% of these CJK ideographs are just different combinations of the same few radicals. Korean seems especially weird, as they have both individual radicals and every precomposed triple (in a block that's been rearranged once or twice, on the basis that nobody was really using it yet). I'm not saying we should nix all precomposed Hanzi/Kanji, exactly, because that's a very convenient way for programs to handle text, but it seems like this system is becoming increasingly awkward for non-western languages.

I feel there's a fundamental flaw when our "universal" text encoding system can't handle the regular creation of new words in a well-understood way, for languages spoken by 1/3rd of the world's population. It's like we're issuing hardware patches for a software problem.

It is, I do not like the way CJK is being doubt with. Not to mention fonts dont include All the CJK variants of the fonts when I use the same word but need a JK variant because that is how it was suppose to be used.

Even the "C" has traditional and simplified variant.

Fortunately I think Unicode is pretty much done for Alphabetical languages. Someday if CJK design Unicode isn't good enough breaking it off to something better isn't entire impossible.