| >Python somehow magically slices unicode strings without chopping characters in half. You need a byte offset to slice a string, and it's impossible to convert from a Unicode rune offset to a byte offset without parsing the entire string up until that point. I'm not all that familiar with Python, but if the language works as you implied, it is basically doing this behind the scenes in common string processing tasks: 1. The user uses some kind of pattern matching function or whatever to find where they want to split the string. Python returns a rune index.
2. The user tells Python to go split apart that string along a rune index. It promptly begins parsing the string all over again until it finds the right byte boundaries.
3. The language then actually creates the new string in between the byte boundaries.
Sure, a Python implementation could statically optimize this, but... why should it have to in the first place? That's fucking stupid and should be considered a language bug when it could be doing this: 1. User pattern matches blah blah blah and gets a byte index.
2. User tells their sane language to split the string apart at the byte index and it just does so.
>I know real programmers keep the byte boundaries for all the chars in all their strings in their head at all timesWhen the hell would you have to remember the byte or rune boundaries for characters in the first place? Why would you be slicing up a string with magic number indices? If you're getting indices from pattern matching functions, you shouldn't care whether they're in bytes or bits or nibbles, you should just be passing them on to your language's split routines (or whatever else you wanted to do). Unless, of course, you're the one actually writing low-level string processing routines, in which case rune offsets are far less useful than byte offsets for the reason explained above. This Python "feature" seems to exist entirely to keep newbies from getting confused when they attempt to slice up strings in their REPL, for I cannot fathom a reason why anyone would write "s[1:4]" in production code. IIRC, Python was designed for pedagogy, so I'm not surprised that it would take on such a pointless implementation cost just to spare teachers from explaining why "s[1:4] gave me question marks" |
This might explain why my comments seem like heresy to you. I would point out that the OP is about Python programmers switching to Go.