| The thing that frustrates me the most about Unicode emoji is the astounding number of combining characters. For combining characters in written languages, you can do an NFC normalization and, with moderate success, get a 1 codepoint = 1 grapheme mapping, but "Emoji 2.0" introduced some ridiculous emoji compositions with the ZWJ character. To use the author's example: woman - 1 codepoint black woman - 2 codepoints, woman + dark Fitzpatrick modifier ️woman kissing woman - 7 codepoints, woman + ZWJ + heart + ZWJ + kissy lips + ZWJ + woman It's like composing Mayan pictographs, except you have to include an invisible character in between each component. Here's another fun one: country flags. Unicode has special characters 🇱 🇮 🇰 🇪 🇹 🇭 🇮 🇸 that you can combine into country codes to create a flag. 🇰+🇷 = 🇰🇷 edit: looks like HN strips emoji? Changed the emoji in the example into English words. They are all supposed to render as a single "character". |
No, no, no, NO.
Please stop spreading this bit of misinformation. Emoji has not changed this situation.
Hangul is one where NFC won't work well. Yes, we actually already encode all possible modern hangul syllable blocks in NFC form as well, but this ignores characters with double choseongs or double jungseongs that can be found in older text. Which you sometimes see in modern text, actually.
All Indic scripts (well, all scripts derived from Brahmi, so this includes many scripts from Southeast Asia as well like Thai) would have trouble doing the NFC thing.
I am annoyed that the unicode spec introduced more complexity into their algorithms to support Unicode, but this is because they could have achieved mostly the same task by not introducing emoji-specific complexity and reusing features that existing scripts already have and have already been accounted for.