|
|
|
|
|
by gf000
296 days ago
|
|
How could any library function work with completely random bytes? Like, how would it iterate over code points? It may want to assume utf8's standard rules and e.g. know that after this byte prefix, the next byte is also part of the same code point (excuse me if I'm using wrong terminology), but now you need complex error handling at every single line, which would be unnecessary if you just made your type represent only valid instances. Again, this is the same simplistic, vs just the right abstraction, this just smudges the complexity over a much larger surface area. If you have a byte array that is not utf-8 encoded, then just... use a byte array. |
|
The entire point of UTF-8 (designed, by the way, by the group that designed Go) is to encode Unicode in such a way that these byte string operations perform the corresponding Unicode operations, precisely so that you don't have to care whether your string is Unicode or just plain ASCII, so you don't need any error handling, except for the rare case where you want to do something related to the text that the string semantically represents. The only operation that doesn't really map is measuring the length.