|
|
|
|
|
by Retra
3207 days ago
|
|
Pfft, that is just as bad. There is no 'fundamental unit of text'. There are different units of text that are appropriate to different tasks. If I want to know how much memory to allocate, bytes are it. If I want to know how much screen space to allocate, font rendering metrics are it. If I want to do word-breaking, grapheme clusters are it. None of these are fundamental. |
|
There are languages whose orthographies don't fit the Unicode grapheme cluster specification, but they're complex enough that I doubt there's any way to deal with them properly other than having someone proficient in them looking over your text processing or pawning it off to a library. At least with grapheme clusters your code won't choke on something as simple as unnormalized Latin text.