|
|
|
|
|
by karteum
343 days ago
|
|
> Usually, what you want is either the byte or the grapheme cluster. Exactly ! That's what I understood after reading this great post https://tonsky.me/blog/unicode/ "Even in the widest encoding, UTF-32, [some grapheme] will still take three 4-byte units to encode. And it still needs to be treated as a single character. If the analogy helps, we can think of the Unicode itself (without any encodings) as being variable-length." I tend to think it's the biggest design decision in Unicode (but maybe I just don't fully see the need and use-cases beyond emojis. Of course I read the section saying it's used in actual languages, but the few examples described could have been made with a dedicated 32 bits codepoint...) |
|