|
|
|
|
|
by kazinator
2390 days ago
|
|
No, where Unicode is complicated is where the Unicode people decided to make it complicated to bolster their egos, to the detriment of everyone downstream of them. Like with most standardization, the people at the helm are the wrong people with the wrong motivations. |
|
Most complexity in Unicode derives from:
neither of which is something that Unicode could have avoided. Complexity in human scripts necessarily leads to complexity in Unicode. Not having Unicode at all would be much worse than Unicode could possibly seem to you -- you'd have to know the codeset/encoding of every string/file/whatever, and never lose track. Not having politics affect Unicode is a pipe dream, and politics has unavoidably led to some duplication.Confusability is NOT a problem created by Unicode, but by humans. Even just within the Latin character set there are confusables (like 1 and l, which often look the same, so much so that early typewriters exploited such similarities to reduce part count).
Nor were things like normalization avoidable, since even before Unicode we had combining codepoints: ASCII was actually a multibyte Latin character set, where, for example, á could be written as a<BS>' (or '<BS>a), where <BS> is backspace. Many such compositions survive today -- for example, X11 Compose key sequences are based on those ASCII compositions (sadly, many Windows compositions aren't). The moment you have combining marks, you have an equivalence (normalization) problem.
Emojis did add a bunch of complexity, but... emojis are essentially a new script that was created by developers in Japan. Eventually the Unicode Consortium was bound to have to standardize emojis for technical and political reasons.
Of course, some mistakes were made: UCS-2/BMP, UTF-16, CJK unification, and others. But the lion's share of Unicode's complexity is not due to those mistakes, but to more natural reasons.