|
The way I usually put it is that Python 3 shifted its priorities. Previously, if you were a UNIX-y scripter writing UNIX-y scripts on your UNIX-y OS, the fact that Python just kind of pretended encoding issues would never exist was a help to you. It adopted the same "everything is ASCII, or at most UTF-8 in the ASCII range, and if it isn't I'll break in cryptic ways" approach as most other UNIX-y scripting things. If you were doing anything other than UNIX-y scripting on your UNIX-y OS, this easily became a huge nightmare in Python 2. Django went through a massive rewrite early in its history precisely because of this, to ensure that encoding/decoding happened at the boundaries and everything you'd work with inside a Django app was already a Unicode string. And I remember what it was like trying to work on the web before that approach, and what the work to fix it was like. Python 3 decided to make the UNIX-y scripters actually learn what a horrid mess UNIX is with respect to locales and encodings and filesystem paths, in order to free the rest of the Python community from the nightmares inflicted by prioritizing the UNIX-y scripters to the exclusion of everyone else. So yes, you have to do more work. Yes, you have to learn that a file path is actually an opaque bag of bytes that may not be in any actual encoding and thus can never properly decode to a string. Yes, you have to learn to use fsencode() and the surrogateescape handler in order not to blow up your scripts. But I'm OK with that, because it puts the workload on you when you're using such a system, rather than magically trying to fix it for you at the cost of everyone else's sanity. It also means that you have to learn to write those scripts correctly. Which is more work than what Python 2 required, but not the world-ending apocalyptic horror it's usually presented as (and is, again, mostly the fault of UNIX-y systems doing their old UNIX-y things, not the fault of Python). |
> Python 3 decided to make the UNIX-y scripters actually learn what a horrid mess UNIX is with respect to locales and encodings and filesystem paths
Show me how much Python 3 improves this. To expand on the program before, make it a directory named 'input' full of files, and a directory named 'output' to put the processed files in. Print each file name as the corresponding file is processed to indicate progress.
I would be applauding Python if it did make the difficulties with this exercise obvious, but it absolutely does not. The file system APIs return strings, but the strings they return may not be valid Unicode. PEP 383 turned the Python 3 str type into a bag of bytes.
Python tries to sweep encodings under the rug. It makes the encoding a default value all over the place and hides conversions everywhere.
I 100% agree that developers need to think about encodings and handle them in their programs. That's exactly why I hate string handling in Python 3: because rather than making you handle the corner cases, it pretends they don't exist, until they're found one by one by your users.
Python 3 encourages developers to write broken string handling code.