| So, write the program correctly. Show me. You just want to fight someone because you're angry, and I don't do that. No matter what someone writes you'll find a way to argue into it being wrong and then prance around declaring "victory". (I'd also bet that you probably couldn't do it if I were the one who got to set the evaluation criteria, and you also couldn't pass other "challenges" like writing proper HTTP handling -- the person who gets to grade the challenge always "wins", which is why you want to be the person who grades the challenge) PEP 383 turned the Python 3 str type into a bag of bytes PEP 383 provided a way to read certain things -- primarily filesystem paths which can be basically anything -- using an escape mechanism to replace non-decodable bytes with surrogates when decoding to string, which in turn allow losslessly transforming back to the original bag of bytes. Which is necessary, because there are real filesystems out there that really have paths and names that can never validly decode from any known text encoding. It doesn't turn strings into "bags of bytes"; the resulting str still is an iterable of actual valid Unicode code points. Python tries to sweep encodings under the rug. As the saying goes, you can't reason someone out of a position they didn't reason their way into, so I won't try here. Python 3 encourages developers to write broken string handling code. Python 3 no longer tries to cover for the random gibberish that's legal to put in filesystem paths, and makes the developer handle it. Are there lots of developers out there who don't realize that filesystem paths can legally contain undecodable garbage? Sure. That's not Python's problem to solve, though; it gives you the surrogateescape handler, and the fsencode helper, and keeps working on things like PEP 538 and PEP 540 to try to give you tools to work around it. But Python can't magically fix the mess that is UNIX locales and bag-of-bytes paths (nothing can, short of burning UNIX down and starting over), and doesn't try to do it for you. |
Unix has its own problems, but the program avian wrote works correctly on Linux. Most encoding issues I encountered when working with Python 3 were on Windows.
PEP 383 was making the best of a bad situation without breaking the API again. The real mistake was having the functions return strings in 3.0. The operating system APIs should have returned path objects that require an explicit conversion to string with an explicit error handling mechanism.
Python gives you all the tools you need to do this right, but they're easy to unknowingly use in ways that break on corner cases. A well-defined API should guide you towards the correct solution and should make pitfalls obvious.
In any case, I should probably give it a rest. I work hard to make sure my programs do this stuff right, and I suppose that's all I can really do.