Hacker News new | ask | show | jobs
by BuckRogers 3406 days ago
Don't blame the community for that. Blame Guido and the core development team. The community at-large weren't rallying for breaking changes to the language. GvR wanted that, it's "his" language afterall. Of course some of the community are more conservative and were willing to more quickly fall in line to goosestep with the core team on Python3. So the tension stems from the Python3 folks having extreme disdain for all the Python2 users and codebases.

I agree with you though, it's what got me looking towards other tech like Node, Go, Elixir. I can look past it all except for the fact they got 'unicode by default' wrong. They should simply do what Go does, everything is a bytestring with assumed encoding as UTF8. What they have today is ridiculous and needs changed before I could embrace Python3+.

2 comments

There were definitely people wanting unicode support, which was the breaking feature.

They supported python 2 for years and year after the release of this, don't know how you can even mention the word goosestep in this context.

I can't understand what's going on with the Python community for folks to even make statements like yours. It has to be an influx of new Python developers who started on 3.

Python2 has unicode support. Python2 already had (better) async IO with Gevent (Tornado is also there). There's just nothing there with Python3 except what I can only describe as propaganda from the Python Software Foundation that has led to outright ignorance with many users.

Some people wanted "unicode strings by default". Which Python3 does have, but they even got that wrong. The general consensus on how to handle this correctly is to make everything a bytestring and assume the encoding as being UTF8 by default. Then like Python2, it just works.

Sorry, why you say python 2 asyncio is better? Python 3.5 - 3.6 new native! asyncio looks great https://maketips.net/tip/146/parallel-execution-of-asyncio-f...
> The general consensus on how to handle this correctly is to make everything a bytestring and assume the encoding as being UTF8 by default.

Whose "general consensus"? It's certainly not the approach adopted by the vast majority of mainstream general-purpose programming languages out there.

You mean most of the mainstream, genpurpose languages in usage today which were initially designed 20-50 years ago. And those which don't actively look for opportunities to shoot themselves in the face with sidestepping changes to break the language? Then no, not that consensus.

But if you were to find any sensible designer of new language with skyrocketing, runaway popularity- such as Rob Pike, they'll tell you differently. Or even Guido van Rossum, if he were to be honest with you. While Pike's colleagues at Bell Labs like Dennis Ritchie may not have designed C this way for obvious reasons, they did design Go that way.

So now it's the consensus of "sensible designers of new languages". Where "sensible" is very subjective, and I have a feeling that your definition of it would basically presuppose agreeing with your conceptual view of strings, begging the question.

Aside from Go, can you give any other examples? Swift is obviously in the "new language" category (more so than Go, anyway), and yet it didn't go down that path.

Well, do your research and come to your own conclusions. Most people are going to agree that UTF8 is the way to go. You can advocate something else, since you seem to take affront to my opposition to Python3's Microsoft-oriented implementation.

If you know anything about Swift, it was designed with a primary goal to being a smooth transition from and interop with ObjC so like other legacy implementations (such as CPython3), it had sacrifices that limited how forward-looking it could be.

I normally wouldn't respond to any comments that are merely a link to someone elses thoughts without something original of your own. Because it means you likely don't know what you're talking about and merely attempting to speak through someone else because you think you agree. So I will respond not to your benefit but for anyone else who is new to Python and may come across this.

I've read that before and the author is ignorant. He's parroting GvR & the CPython core development team's line that unicode strings are codepoints. Sure, but he's arguing with himself and note the argument is Python2 vs 3. That narrow focus is what results in his tunnelvision. As a result of the argument as he frames it, Python3 is not better than Python2 in string handling, it's merely different. One favors POSIX (Linux), Python2. One favors the Windows way of doing things, Python3.

There is an outright better way to handle strings. It's what Google did with Go. How do we know it's better? Well, it is because it makes more sense on technical merits and members of the CPython core dev team have admitted that if Python3 were designed today they would go down this path. But during the initial Python3000 talks this option was not as obvious. Bad timing or poor implementation choices. Take your pick, given the runaway feature-soup that Python3 has become I'd assume both.

So like all tech, let Python3 live or die on its technical merits. That's exactly what the PSF has been afraid of, so we have the 2020 date which is nothing more than a political stunt among others. Python3 is merely different, it favors one usecase over another, but did not outright make Python better. To break a language for technical churn is and was a terrible idea.

You're right, I do lean towards agreeing with the author of the blog post. However, I wasn't (and am still not) in any way certain, and didn't want to be one of those asses you see on the internet who turn everything into a religious war. So I just put the information out there because I (in my ignorance) thought it was useful information from which intelligent people could draw their own conclusions.

Honestly, I don't care a great deal about string handling in Python and just wanted to inject (what I thought was) more information into the discussion. I'm kinda regretting that now. Lesson learned: steer clear and leave it to the experts.

I'm curious, how does Go handle strings?

Well, I'm not an expert but in effort for full-disclosure Guido and the CPython core dev team aren't either. They hold a myriad of excuses for their decisions and they're all highly suspect from even a casual observer that doesn't just drink the koolaid. In the end, they'll just tell you they maintain CPython so only their opinion matters. Fair, but they're still wrong. Python3 is controversial for good reasons.[0] It's not lazy folks or whatever ad hominem is out there today. I couldn't tell your intentions given the lack of information included with your post.

Go handles strings by having strings be like Python3's byte-strings and unicode-strings as one type. This enables code to be written that generally doesn't force you to very often think about encodings, which you shouldn't have to as UTF8 is the one true encoding.[1] Or litter your code with encode/decode, or receive exceptions from strings (see Zed's post on some of that) where there wasn't previously. Python3 solved the unicode emojibake mixing unicode and bytes problems that some developers created for themselves in Python2, but did so by forcing the burden to every single Python3 developer, and breaking everyone's code while simultaneously refusing to engineer an interpreter that could run both CPython2/3 bytecode. Which is possible, the Hotspot JVM and .NET CLR prove it. Shifting additional burdens to the developer in situations where it's necessary, makes sense. It wasn't here because of both Python's general abstraction level, and Go showed it can be solved elegantly. Strings are just bytes and they're assumed to be UTF8 encoding. Everyone wins. Only Windows-specific implementations like the original .Net CLR shouldn't be UTF-8 by default, internal and external representation. Only a diehard Windows-centric person would disagree or someone with a legacy implementation (Java, C#, Python3 etc). The CPython3 maintainers fully admit they're leaning towards the Windows way of handling strings.

As you know, handling text/bytes is fairly critical and fundamental. For Python3 to get this wrong with such a late stage breaking change with no effort to make up for it with a unified bytecode VM is unfortunate. Add in the feature soup and the whole thing is a mess.

[0]http://learning-python.com/books/python-changes-2014-plus.ht...

[1]http://utf8everywhere.org/

Also, it's worth noting that creators of go _invented_ utf8.
Minor nit: s/emojibake/mojibake/