| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rjeli 2767 days ago
	Yeah, for writing scripts like this - python 2: 0% chance of thinking about Unicode. python 3: 5% chance i have to waste time debugging some random str decoding issue, no benefits. why bother?

4 comments

shawnz 2767 days ago

This might be true if you're an english speaker, running the script on an english platform and only consuming data from english services. And also you are sure that no non-english speaker will ever take over the development of your script and you will never have to localize it to other languages. Otherwise, it's exactly the opposite. Python 3 is what you should be using if you want 0% chance of being stopped by a string encoding issue.

Python 3 might occasionally require some extra steps when consuming strings compared to Python 2, but the reality is that those steps were always necessary. Python 2 just hid those details in a way that was only really safe for english-exclusive development. That doesn't mean that Python 2 is easier to use or less brittle. In fact I would say it means the opposite.

link

josefx 2767 days ago

For most strings I don't care about language, encoding or related overhead. In my scripts they are best dealt with as ophaque bytes with a few specific byte patterns that are the same in ascii and utf8, as well as various other encodings.

Last unicode issue I had was on a system german characters, because some library assumed it had to explicitly perform encoding with a bad default setting. If the library didn't try to be smart the program would have worked independently of system or language, instead it failed on any non english system by trying to convert a perfectly fine, system specific encoding to utf8.

link

_0w8t 2767 days ago

Python2 worked fine with Russian input/output.

link

hackbinary 2767 days ago

Why bother? Because Python 2 is EOL in a little less than 2 years.

The NHS chose not to upgrade to from Windows XP and 2003 and look where that got them last year: a massive crypto locker infection.

link

rphlx 2767 days ago

FWIW Win10 was affected by that same horrific SMB RCE vuln. So the implied argument that 10 has been immune or even much less vulnerable to ransomware over the past year or two is on shaky ground, though I agree it probably will start to have some merit going forward.

link

rwmj 2767 days ago

I suppose unless someone either forks it or keeps delivering patches outside the Python project. That wasn't really an option for Windows XP, but I'm quite sure if it had been then someone would be doing it.

link

sweettea 2767 days ago

This is already true -- tauthon is a updated, patched Python 2 fork. https://github.com/naftaliharris/tauthon

link

cmyr 2767 days ago

Put another way: python 2 has a 0% chance of correctly handling Unicode.

link

mjevans 2767 days ago

Not actually the case. UTF-8, using only SPECIFIC operations that don't try to split up strings or replace things that aren't exact matches for a given text, will result in valid output as long as there was valid input.

All interchanged Unicode text should be UTF-8, never use another encoding* (without a really compelling reason).

No, storing it as an array of unicode characters isn't a compelling reason during interchange.

ALSO, never use a BOM; that will break things.

The second answer (should be anchor-linked) in this goes over MOST of the advantages of UTF-8, but it doesn't capture that some carefully input operations in otherwise completely Unicode //unaware// 'string' functions result in no change to string validity.

The only potential issue is if recognizing something in different normalization representations is important. However, for nearly all quick and dirty tasks (where a short script is most likely) it usually doesn't matter. For everything else a different paradigm than the one Python3 picked would be better. (One where adding filters to a read file is OPTIONAL and they can be invoked on individual byte-strings as well.)

https://stackoverflow.com/questions/6882301/what-is-the-best...

link

baq 2767 days ago

You must live in the US.

link