Hacker News new | ask | show | jobs
by soperj 3404 days ago
There were definitely people wanting unicode support, which was the breaking feature.

They supported python 2 for years and year after the release of this, don't know how you can even mention the word goosestep in this context.

1 comments

I can't understand what's going on with the Python community for folks to even make statements like yours. It has to be an influx of new Python developers who started on 3.

Python2 has unicode support. Python2 already had (better) async IO with Gevent (Tornado is also there). There's just nothing there with Python3 except what I can only describe as propaganda from the Python Software Foundation that has led to outright ignorance with many users.

Some people wanted "unicode strings by default". Which Python3 does have, but they even got that wrong. The general consensus on how to handle this correctly is to make everything a bytestring and assume the encoding as being UTF8 by default. Then like Python2, it just works.

Sorry, why you say python 2 asyncio is better? Python 3.5 - 3.6 new native! asyncio looks great https://maketips.net/tip/146/parallel-execution-of-asyncio-f...
> The general consensus on how to handle this correctly is to make everything a bytestring and assume the encoding as being UTF8 by default.

Whose "general consensus"? It's certainly not the approach adopted by the vast majority of mainstream general-purpose programming languages out there.

You mean most of the mainstream, genpurpose languages in usage today which were initially designed 20-50 years ago. And those which don't actively look for opportunities to shoot themselves in the face with sidestepping changes to break the language? Then no, not that consensus.

But if you were to find any sensible designer of new language with skyrocketing, runaway popularity- such as Rob Pike, they'll tell you differently. Or even Guido van Rossum, if he were to be honest with you. While Pike's colleagues at Bell Labs like Dennis Ritchie may not have designed C this way for obvious reasons, they did design Go that way.

So now it's the consensus of "sensible designers of new languages". Where "sensible" is very subjective, and I have a feeling that your definition of it would basically presuppose agreeing with your conceptual view of strings, begging the question.

Aside from Go, can you give any other examples? Swift is obviously in the "new language" category (more so than Go, anyway), and yet it didn't go down that path.

Well, do your research and come to your own conclusions. Most people are going to agree that UTF8 is the way to go. You can advocate something else, since you seem to take affront to my opposition to Python3's Microsoft-oriented implementation.

If you know anything about Swift, it was designed with a primary goal to being a smooth transition from and interop with ObjC so like other legacy implementations (such as CPython3), it had sacrifices that limited how forward-looking it could be.

I'm not at all opposed to UTF-8 as internal encoding for strings. But that's completely different and orthogonal to what you're talking about, which is whether strings and byte arrays should be one and the same thing semantically, and represented by the same type, or by compatible types that are used interchangeably.

I want my strings to be strings - that is, an object for which it is guaranteed that enumerating codepoints always succeeds (and, consequently, any operation that requires valid codepoints, like e.g. case-insensitive comparison, also succeeds). This is not the case for "just assume a bytearray is UTF-8 if used in string context", which is why I consider this model broken on the abstraction level. It's kinda similar to languages that don't distinguish between strings and numbers, and interpret any numeric-looking string as number in the appropriate context.

FWIW, the string implementation in Python 3 supports UTF-8 representation. And it could probably be changed to use that as the primary canonical representation, only generating other ones if and when they're requested.