Hacker News new | ask | show | jobs
by account42 2250 days ago
Character (code point) iterators are useless.

For parsing text-based formats, UTF-8 has the nice property that the encoded byte sequence of a character is not a subsequence of the encoding for any other chracter or sequence of other characters. This means splitting on byte sequences of UTF-8 works just as well as spliting on code points.

And for text editing you need to deal with grapheme clusers anyway, which can be made up of a variable number of code points - so having these be made up of a variable number of bytes doesn't make anything worse.