|
|
|
|
|
by wisty
5401 days ago
|
|
Sorry, my Unicode is a bit weak. I think Twitter should use whichever usually gives the user the most characters, to prevent them from getting burnt. I think in many cases, the normalized form is more permissive, as it puts "character plus diacritic" together into one character. In a language with lots of diacritics, the number of codepoints might be more than the number of normalized characters (depending on the client). You wouldn't want to allow (say) ~70 Korean characters on one OS, and 140 on another, just because they use different codepoints to represent certain characters - one with character then diacritic (2 codepoints?), another with both crammed together in one codepoint. But as I said, I'm not a unicode guru (and I don't know much about Korean, I just saw it as an example). This might be wrong. |
|