|
|
|
|
|
by millstone
4659 days ago
|
|
Code points is a better abstraction than code units, but it's still a piss-poor abstraction. Consider the problem of producing a valid substring from a Unicode string. It's important that you not split surrogate pairs, and it's true working with code points spares you from that particular problem. But it's also important that you not split combining marks, and zero width joiners, and Hangul syllables... (see http://www.unicode.org/reports/tr29/ for all the gory details). An average programmer cannot correctly extract a substring from a Unicode string whether given the code units or the code points. These abstractions are inadequate: instead you want something like grapheme clusters. |
|