|
|
|
|
|
by spangry
3403 days ago
|
|
You're right, I do lean towards agreeing with the author of the blog post. However, I wasn't (and am still not) in any way certain, and didn't want to be one of those asses you see on the internet who turn everything into a religious war. So I just put the information out there because I (in my ignorance) thought it was useful information from which intelligent people could draw their own conclusions. Honestly, I don't care a great deal about string handling in Python and just wanted to inject (what I thought was) more information into the discussion. I'm kinda regretting that now. Lesson learned: steer clear and leave it to the experts. I'm curious, how does Go handle strings? |
|
Go handles strings by having strings be like Python3's byte-strings and unicode-strings as one type. This enables code to be written that generally doesn't force you to very often think about encodings, which you shouldn't have to as UTF8 is the one true encoding.[1] Or litter your code with encode/decode, or receive exceptions from strings (see Zed's post on some of that) where there wasn't previously. Python3 solved the unicode emojibake mixing unicode and bytes problems that some developers created for themselves in Python2, but did so by forcing the burden to every single Python3 developer, and breaking everyone's code while simultaneously refusing to engineer an interpreter that could run both CPython2/3 bytecode. Which is possible, the Hotspot JVM and .NET CLR prove it. Shifting additional burdens to the developer in situations where it's necessary, makes sense. It wasn't here because of both Python's general abstraction level, and Go showed it can be solved elegantly. Strings are just bytes and they're assumed to be UTF8 encoding. Everyone wins. Only Windows-specific implementations like the original .Net CLR shouldn't be UTF-8 by default, internal and external representation. Only a diehard Windows-centric person would disagree or someone with a legacy implementation (Java, C#, Python3 etc). The CPython3 maintainers fully admit they're leaning towards the Windows way of handling strings.
As you know, handling text/bytes is fairly critical and fundamental. For Python3 to get this wrong with such a late stage breaking change with no effort to make up for it with a unified bytecode VM is unfortunate. Add in the feature soup and the whole thing is a mess.
[0]http://learning-python.com/books/python-changes-2014-plus.ht...
[1]http://utf8everywhere.org/