|
|
|
|
|
by Manishearth
3384 days ago
|
|
> The thing that frustrates me the most about Unicode emoji is the astounding number of combining characters. For combining characters in written languages, you can do an NFC normalization and, with moderate success, get a 1 codepoint = 1 grapheme mapping, but "Emoji 2.0" introduced some ridiculous emoji compositions with the ZWJ character. No, no, no, NO. Please stop spreading this bit of misinformation. Emoji has not changed this situation. Hangul is one where NFC won't work well. Yes, we actually already encode all possible modern hangul syllable blocks in NFC form as well, but this ignores characters with double choseongs or double jungseongs that can be found in older text. Which you sometimes see in modern text, actually. All Indic scripts (well, all scripts derived from Brahmi, so this includes many scripts from Southeast Asia as well like Thai) would have trouble doing the NFC thing. I am annoyed that the unicode spec introduced more complexity into their algorithms to support Unicode, but this is because they could have achieved mostly the same task by not introducing emoji-specific complexity and reusing features that existing scripts already have and have already been accounted for. |
|