Hacker News new | ask | show | jobs
by BuckRogers 3403 days ago
I normally wouldn't respond to any comments that are merely a link to someone elses thoughts without something original of your own. Because it means you likely don't know what you're talking about and merely attempting to speak through someone else because you think you agree. So I will respond not to your benefit but for anyone else who is new to Python and may come across this.

I've read that before and the author is ignorant. He's parroting GvR & the CPython core development team's line that unicode strings are codepoints. Sure, but he's arguing with himself and note the argument is Python2 vs 3. That narrow focus is what results in his tunnelvision. As a result of the argument as he frames it, Python3 is not better than Python2 in string handling, it's merely different. One favors POSIX (Linux), Python2. One favors the Windows way of doing things, Python3.

There is an outright better way to handle strings. It's what Google did with Go. How do we know it's better? Well, it is because it makes more sense on technical merits and members of the CPython core dev team have admitted that if Python3 were designed today they would go down this path. But during the initial Python3000 talks this option was not as obvious. Bad timing or poor implementation choices. Take your pick, given the runaway feature-soup that Python3 has become I'd assume both.

So like all tech, let Python3 live or die on its technical merits. That's exactly what the PSF has been afraid of, so we have the 2020 date which is nothing more than a political stunt among others. Python3 is merely different, it favors one usecase over another, but did not outright make Python better. To break a language for technical churn is and was a terrible idea.

1 comments

You're right, I do lean towards agreeing with the author of the blog post. However, I wasn't (and am still not) in any way certain, and didn't want to be one of those asses you see on the internet who turn everything into a religious war. So I just put the information out there because I (in my ignorance) thought it was useful information from which intelligent people could draw their own conclusions.

Honestly, I don't care a great deal about string handling in Python and just wanted to inject (what I thought was) more information into the discussion. I'm kinda regretting that now. Lesson learned: steer clear and leave it to the experts.

I'm curious, how does Go handle strings?

Well, I'm not an expert but in effort for full-disclosure Guido and the CPython core dev team aren't either. They hold a myriad of excuses for their decisions and they're all highly suspect from even a casual observer that doesn't just drink the koolaid. In the end, they'll just tell you they maintain CPython so only their opinion matters. Fair, but they're still wrong. Python3 is controversial for good reasons.[0] It's not lazy folks or whatever ad hominem is out there today. I couldn't tell your intentions given the lack of information included with your post.

Go handles strings by having strings be like Python3's byte-strings and unicode-strings as one type. This enables code to be written that generally doesn't force you to very often think about encodings, which you shouldn't have to as UTF8 is the one true encoding.[1] Or litter your code with encode/decode, or receive exceptions from strings (see Zed's post on some of that) where there wasn't previously. Python3 solved the unicode emojibake mixing unicode and bytes problems that some developers created for themselves in Python2, but did so by forcing the burden to every single Python3 developer, and breaking everyone's code while simultaneously refusing to engineer an interpreter that could run both CPython2/3 bytecode. Which is possible, the Hotspot JVM and .NET CLR prove it. Shifting additional burdens to the developer in situations where it's necessary, makes sense. It wasn't here because of both Python's general abstraction level, and Go showed it can be solved elegantly. Strings are just bytes and they're assumed to be UTF8 encoding. Everyone wins. Only Windows-specific implementations like the original .Net CLR shouldn't be UTF-8 by default, internal and external representation. Only a diehard Windows-centric person would disagree or someone with a legacy implementation (Java, C#, Python3 etc). The CPython3 maintainers fully admit they're leaning towards the Windows way of handling strings.

As you know, handling text/bytes is fairly critical and fundamental. For Python3 to get this wrong with such a late stage breaking change with no effort to make up for it with a unified bytecode VM is unfortunate. Add in the feature soup and the whole thing is a mess.

[0]http://learning-python.com/books/python-changes-2014-plus.ht...

[1]http://utf8everywhere.org/

Also, it's worth noting that creators of go _invented_ utf8.
Minor nit: s/emojibake/mojibake/