|
|
|
|
|
by account42
2250 days ago
|
|
Character (code point) iterators are useless. For parsing text-based formats, UTF-8 has the nice property that the encoded byte sequence of a character is not a subsequence of the encoding for any other chracter or sequence of other characters. This means splitting on byte sequences of UTF-8 works just as well as spliting on code points. And for text editing you need to deal with grapheme clusers anyway, which can be made up of a variable number of code points - so having these be made up of a variable number of bytes doesn't make anything worse. |
|