Hacker News new | ask | show | jobs
by masklinn 5405 days ago
> The normalization twitter uses according to other posters in this thread always uses the multi-codepoint form when possible

You're confused, that is NFD (Normalization Form Decomposed). NFC is the result of a canonical composition of the sequence.

> That's why it baffles me if they count by codepoint and not character!

Because "characters" are a fuzzy (if not meaningless) concept in Unicode, especially when talking about the implementation side. "Grapheme cluster" is well defined, but most languages have little support for it.

Codepoints is easy to implement, it's well defined and in many case an NFC codepoint will roughly map onto what users think of as a character.