|
|
|
|
|
by adgar
5408 days ago
|
|
Wait, really? I thought normalization form C was the form where composite characters were always used when possible. Why restrict by the number of codepoints (vs characters) if you're explicitly going to use the form which goes out of its way to use multi-codepoint characters? The only reason I can think of is that they internally use UTF-32, so counting codepoints is more efficient. But I thought they used UTF-8. Edit: the other reason I can think of is that conversion to normalization form C already counts the codepoints. Though I can't imagine making it also count characters would be nontrivial. |
|
I'm sure it had some historical technical limitation to 140 ASCII or maybe 70 utf-8 chars (or something else logical), but they probably had to accomodate people who wanted to use non-English characters in a post and not get a lecture on unicode encoding; and some slightly offensive "so ... people like you only get 70 chars" message.