|
Have you never had to indicate how many bytes you are sending over a stream, say in the Content-Length of an HTTP response? Have you never put strings into a byte buffer? But that doesn't matter. Let's say you are correct: when working with strings, you more often want the rune length. It still wouldn't be the right decision, given the other design decisions of Go, because it would have needlessly complicated things with only arguable benefits. Let me show you what I mean. The len() function works with a whole lot of things: strings, arrays, slices, maps and channels. For the first three, len() returns the number of bytes involved. This is because all three are backed by an array, and so sensibly have similar semantics. It would have violated the principal of least surprise for anyone who knew the language to have an array-backed storage not return a byte count. Both the language developers and the users of it would have to special-case strings, in code and in their brains. Now, they could have decided to do it anyway, but then another surprise awaits. What happens when you take a slice of a string? Oh no, more special casing and more complication for everyone. The Go developers do special-case where doing so would clearly be a win for their users. Consider range, which iterates by runes over a string, potentially moving the index on the underlying array forward by more than 1 on each pass. That is clearly going to be the most common usecase the user is going to want and so was worth doing. It also eliminates many of the usecases where getting the length of a string in runes would matter to you. Not all, but a lot. |
What happens when you slice a unicode string in Go is that it cuts multi-byte characters right in half, unless you get the byte boundaries just right. I know real programmers keep the byte boundaries for all the chars in all their strings in their head at all times, but for people like me this basically makes string slicing unusable for non-ASCII text.
Python somehow magically slices unicode strings without chopping characters in half.
In Python:
In Go: