|
|
|
|
|
by darkengine
3389 days ago
|
|
That is why I said "with moderate success". It's not 100% reliable, but mostly good enough for basic cases. Twitter, for example, used to do an NFC normalization and count codepoints to enforce the 140 charcater limit, and this was close enough to or exactly right for enforcing 140 graphemes in probably 98% of text on Twitter. They can no longer do this because some new emoji can consume 7 codepoints for a single grapheme. |
|
Twitter is not a "basic case", it's a case where the length limit is arbitrary and the specifics don't matter much anymore. Usually when you want to segment text that is not the case.
Edit: basically, my problem with your original comment is that it helps spread the misinformation that a lot of these things are "just" emoji problems, so some folks tend to ignore them since they don't care about emoji that much, and real human languages suffer.