Hacker News new | ask | show | jobs
by slavik81 2735 days ago
So, write the program correctly. Show me.

> Python 3 decided to make the UNIX-y scripters actually learn what a horrid mess UNIX is with respect to locales and encodings and filesystem paths

Show me how much Python 3 improves this. To expand on the program before, make it a directory named 'input' full of files, and a directory named 'output' to put the processed files in. Print each file name as the corresponding file is processed to indicate progress.

I would be applauding Python if it did make the difficulties with this exercise obvious, but it absolutely does not. The file system APIs return strings, but the strings they return may not be valid Unicode. PEP 383 turned the Python 3 str type into a bag of bytes.

Python tries to sweep encodings under the rug. It makes the encoding a default value all over the place and hides conversions everywhere.

I 100% agree that developers need to think about encodings and handle them in their programs. That's exactly why I hate string handling in Python 3: because rather than making you handle the corner cases, it pretends they don't exist, until they're found one by one by your users.

Python 3 encourages developers to write broken string handling code.

1 comments

So, write the program correctly. Show me.

You just want to fight someone because you're angry, and I don't do that. No matter what someone writes you'll find a way to argue into it being wrong and then prance around declaring "victory".

(I'd also bet that you probably couldn't do it if I were the one who got to set the evaluation criteria, and you also couldn't pass other "challenges" like writing proper HTTP handling -- the person who gets to grade the challenge always "wins", which is why you want to be the person who grades the challenge)

PEP 383 turned the Python 3 str type into a bag of bytes

PEP 383 provided a way to read certain things -- primarily filesystem paths which can be basically anything -- using an escape mechanism to replace non-decodable bytes with surrogates when decoding to string, which in turn allow losslessly transforming back to the original bag of bytes.

Which is necessary, because there are real filesystems out there that really have paths and names that can never validly decode from any known text encoding. It doesn't turn strings into "bags of bytes"; the resulting str still is an iterable of actual valid Unicode code points.

Python tries to sweep encodings under the rug.

As the saying goes, you can't reason someone out of a position they didn't reason their way into, so I won't try here.

Python 3 encourages developers to write broken string handling code.

Python 3 no longer tries to cover for the random gibberish that's legal to put in filesystem paths, and makes the developer handle it. Are there lots of developers out there who don't realize that filesystem paths can legally contain undecodable garbage? Sure. That's not Python's problem to solve, though; it gives you the surrogateescape handler, and the fsencode helper, and keeps working on things like PEP 538 and PEP 540 to try to give you tools to work around it. But Python can't magically fix the mess that is UNIX locales and bag-of-bytes paths (nothing can, short of burning UNIX down and starting over), and doesn't try to do it for you.

You're right that I'm frustrated. I'm not even upset at anyone in this thead, but from previous discussions. I appologise for carrying that baggage here. Having my morality questioned when I legitimately want to make programs work correctly for foreign languages has left me... emotional on this subject.

Unix has its own problems, but the program avian wrote works correctly on Linux. Most encoding issues I encountered when working with Python 3 were on Windows.

PEP 383 was making the best of a bad situation without breaking the API again. The real mistake was having the functions return strings in 3.0. The operating system APIs should have returned path objects that require an explicit conversion to string with an explicit error handling mechanism.

Python gives you all the tools you need to do this right, but they're easy to unknowingly use in ways that break on corner cases. A well-defined API should guide you towards the correct solution and should make pitfalls obvious.

In any case, I should probably give it a rest. I work hard to make sure my programs do this stuff right, and I suppose that's all I can really do.