|
|
|
|
|
by int_19h
3211 days ago
|
|
> Not exactly. The library just has to treat it as a string and not worry about the encoding (i.e. not try to encode it to/from the unicode type). Or do anything else that implies encoding. Like measure length, index, slice, change case etc. |
|
Slicing works fine on a UTF-8 string as I'm slicing between ASCII characters which don't appear inside a non ASCII character. If I needed to slice between certain code points it would still be easy as I just look for the appropriate 2-4 byte sequence and slice before or after it. Python doesn't support graphemes so can't do much with those.
Measuring length is not something that comes up for me. And indexing to an absolute spot in a string never comes up at all.
But yes, if I did have to call a text processing library I'd have to then encode/decode to the Unicode type. But that's rare enough that I can keep everything UTF-8.