| > WTF business do emojis have in Unicode? Unicode didn't invent emoji, they incorporated it because they were already popular in Japan, and if they didn't incorporate it, it would greatly reduce Japanese adoption. Keep in mind that Unicode was intended to unify all the disparate encodings that had been brewed up to support different languages and which made exchanging documents between non-English speaking countries a nightmare. The term "mojibake" comes to mind [0] - Japan alone had so many encodings that a slang term for text encoded with something different than what your device expected (and subsequently got rendered as nonsensical/garbled text) came about. And they weren't alone, of course [1]. > What we need now is a standardized, sane subset of Unicode that implementations can support while rejecting the insane scope creep that got added on top of that. Unicode wasn't intended to be pretty. It was intended to be the one system that everyone used, and a way to increase adoption was to do some less than ideal things, like duplicate characters (so it would be easier to convert to Unicode). You may never need anything outside the BMP, but that doesn't make the rest of the planes worthless. Ignoring the value of including dead and nearing-extinct languages for preservation purposes (not being able to type a language will basically guarantee its extinction, with inventing a new encoding and storing text as jpgs being the only real alternatives), there are a lot of people speaking languages found in the SMP [2][3] ([2] has 83 million native speakers, for example). [0]: https://en.wikipedia.org/wiki/Mojibake [1]: https://segfault.kiev.ua/cyrillic-encodings/ [2]: https://en.wikipedia.org/wiki/Modi_(Unicode_block) [3]: https://en.wikipedia.org/wiki/Chakma_(Unicode_block) |
Mojibake was not a "Japan has too many encodings" problem. It was a "western developers assume everyone is using CP1252" problem.
> Unicode wasn't intended to be pretty. It was intended to be the one system that everyone used, and a way to increase adoption was to do some less than ideal things, like duplicate characters (so it would be easier to convert to Unicode).
Unfortunately they undermined all that with Han Unification, with the result that it's never going to be adopted in Japan.