Hacker News new | ask | show | jobs
by BuckRogers 3404 days ago
I can't understand what's going on with the Python community for folks to even make statements like yours. It has to be an influx of new Python developers who started on 3.

Python2 has unicode support. Python2 already had (better) async IO with Gevent (Tornado is also there). There's just nothing there with Python3 except what I can only describe as propaganda from the Python Software Foundation that has led to outright ignorance with many users.

Some people wanted "unicode strings by default". Which Python3 does have, but they even got that wrong. The general consensus on how to handle this correctly is to make everything a bytestring and assume the encoding as being UTF8 by default. Then like Python2, it just works.

2 comments

Sorry, why you say python 2 asyncio is better? Python 3.5 - 3.6 new native! asyncio looks great https://maketips.net/tip/146/parallel-execution-of-asyncio-f...
> The general consensus on how to handle this correctly is to make everything a bytestring and assume the encoding as being UTF8 by default.

Whose "general consensus"? It's certainly not the approach adopted by the vast majority of mainstream general-purpose programming languages out there.

You mean most of the mainstream, genpurpose languages in usage today which were initially designed 20-50 years ago. And those which don't actively look for opportunities to shoot themselves in the face with sidestepping changes to break the language? Then no, not that consensus.

But if you were to find any sensible designer of new language with skyrocketing, runaway popularity- such as Rob Pike, they'll tell you differently. Or even Guido van Rossum, if he were to be honest with you. While Pike's colleagues at Bell Labs like Dennis Ritchie may not have designed C this way for obvious reasons, they did design Go that way.

So now it's the consensus of "sensible designers of new languages". Where "sensible" is very subjective, and I have a feeling that your definition of it would basically presuppose agreeing with your conceptual view of strings, begging the question.

Aside from Go, can you give any other examples? Swift is obviously in the "new language" category (more so than Go, anyway), and yet it didn't go down that path.

Well, do your research and come to your own conclusions. Most people are going to agree that UTF8 is the way to go. You can advocate something else, since you seem to take affront to my opposition to Python3's Microsoft-oriented implementation.

If you know anything about Swift, it was designed with a primary goal to being a smooth transition from and interop with ObjC so like other legacy implementations (such as CPython3), it had sacrifices that limited how forward-looking it could be.

I'm not at all opposed to UTF-8 as internal encoding for strings. But that's completely different and orthogonal to what you're talking about, which is whether strings and byte arrays should be one and the same thing semantically, and represented by the same type, or by compatible types that are used interchangeably.

I want my strings to be strings - that is, an object for which it is guaranteed that enumerating codepoints always succeeds (and, consequently, any operation that requires valid codepoints, like e.g. case-insensitive comparison, also succeeds). This is not the case for "just assume a bytearray is UTF-8 if used in string context", which is why I consider this model broken on the abstraction level. It's kinda similar to languages that don't distinguish between strings and numbers, and interpret any numeric-looking string as number in the appropriate context.

FWIW, the string implementation in Python 3 supports UTF-8 representation. And it could probably be changed to use that as the primary canonical representation, only generating other ones if and when they're requested.

A default UTF8 string-type has to be allowed to be used interchangeably with bytestrings since ASCII is a valid subset. Your string type shouldn't be spellchecking nor checking for complete sentences either. What comes in, comes in. Validate it elsewhere.

Thus Go's strings don't potentially fail your desire for a guarantee anymore than anything else would assuming UTF8. They're unicode-by-default, which was the whole point to Python3 but Go has it too in a more elegant way. That's the beauty to UTF8 by default, you can pass it valid UTF8 or ASCII since it's a subset, the context of which it's being received is up to you. If you're expecting bytes it works, if you're expecting unicode codepoints that works. There's no reason to get your hands dirty with encodings unless you need to decode UTF16 etc first. If there is still a concern about data validation, that's up to you not your string type to throw an exception.