Hacker News new | ask | show | jobs
by shawnz 2775 days ago
This might be true if you're an english speaker, running the script on an english platform and only consuming data from english services. And also you are sure that no non-english speaker will ever take over the development of your script and you will never have to localize it to other languages. Otherwise, it's exactly the opposite. Python 3 is what you should be using if you want 0% chance of being stopped by a string encoding issue.

Python 3 might occasionally require some extra steps when consuming strings compared to Python 2, but the reality is that those steps were always necessary. Python 2 just hid those details in a way that was only really safe for english-exclusive development. That doesn't mean that Python 2 is easier to use or less brittle. In fact I would say it means the opposite.

2 comments

For most strings I don't care about language, encoding or related overhead. In my scripts they are best dealt with as ophaque bytes with a few specific byte patterns that are the same in ascii and utf8, as well as various other encodings.

Last unicode issue I had was on a system german characters, because some library assumed it had to explicitly perform encoding with a bad default setting. If the library didn't try to be smart the program would have worked independently of system or language, instead it failed on any non english system by trying to convert a perfectly fine, system specific encoding to utf8.

Python2 worked fine with Russian input/output.