| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ubernostrum 3210 days ago

Speaking for myself, Unicode's original fundamental mistake was one that could only be recognized as a mistake in hindsight: insisting on round-trip compatibility with existing encodings.

Round-trip compatibility meant Unicode had to not only adopt but permanently preserve all the mistakes and inconsistencies of encodings which were popular at the time. Which is how we get a bunch of duplicates, a bunch of code points that are there but only supposed to be used for round-tripping, some of the un-fun edge cases for Latin text where things have both composed and decomposed forms, some of the weirder aspects of equivalence and normalization, etc.

At the time it seemed like a smart and rational thing to do since it meant you could losslessly transition from your existing character set, and then losslessly go back to it if you wanted to, but now that Unicode "won" it's just a source of "well, that's annoying and inconsistent but they needed it for round-tripping" explanations.

In particular, round-trip compatibility meant that Unicode ended up containing a bunch of variant forms of things that existing encodings treated as distinct characters, but which probably would not pass the test of being distinct graphemes by Unicode's definition. Declaring the variant forms to be a contextual issue left up to the font or the rendering system would have been better,

Ironically, the second big mistake was to then try to switch philosophies and do just that with the CJK characters, sparking the whole Han unification mess.