|
If we could go back in time to Unicode's beginning and start over but with all that we know today... Unicode would still look a lot like what it looks like today, except that: - UTF-8 would have been specified first
- we'd not have had UCS-2, nor UTF-16
- we'd have more than 21 bits of codespace
- CJK unification would not have been attempted
- we might or might not have pre-composed codepoints[0]
- a few character-specific mistakes would have gone unmade
which is to say, again, that Unicode would mostly come out the same as it is today.Everything to do with normalization, graphemes, and all the things that make Unicode complex and painful would still have to be there because they are necessary and not mistakes. Unicode's complexity derives from the complexity of human scripts. [0] Going back further to create Unicode before there was a crushing need for it would be impossible -- try convincing computer scientists in the 60s to use Unicode... Or IBM in the 30s. For this reason, pre-composed codepoints would still have proven to be very useful, so we'd probably still have them if we started over, and we'd still end up with NFC/NFKC being closed to new additions, which would leave NFD as the better NF just as it is today. |
Love an interesting sci-fi scenario. UTF-8 was a really neat technical trick, and a lot of the early UTF-8 technical documentation was already on IBM letterhead. I think if you showed up with the right documents at various points in history IBM would have been ecstatic to have an idea like UTF-8, at least. UTF-8 would have sidestepped a lot of mistakes with code pages and CCSIDs (IBM's attempts at 16-bit characters, encoding both code page and character), and IBM developers likely would have enjoyed that. Also, they might have been delightfully confused about how the memo was coming from inside the house by coworkers not currently on payroll.
Possibly that even extends as far back as the 1930s and formation of the company, because even then IBM aspired to be a truly International company, given the I in its own name.
I'm not sure how much of the rest of Unicode you could have convinced them of, but it's funny imagining explaining say Emoji to IBM suits at various points in history.