Yeah, for writing scripts like this - python 2: 0% chance of thinking about Unicode. python 3: 5% chance i have to waste time debugging some random str decoding issue, no benefits. why bother?
This might be true if you're an english speaker, running the script on an english platform and only consuming data from english services. And also you are sure that no non-english speaker will ever take over the development of your script and you will never have to localize it to other languages. Otherwise, it's exactly the opposite. Python 3 is what you should be using if you want 0% chance of being stopped by a string encoding issue.
Python 3 might occasionally require some extra steps when consuming strings compared to Python 2, but the reality is that those steps were always necessary. Python 2 just hid those details in a way that was only really safe for english-exclusive development. That doesn't mean that Python 2 is easier to use or less brittle. In fact I would say it means the opposite.
For most strings I don't care about language, encoding or related overhead. In my scripts they are best dealt with as ophaque bytes with a few specific byte patterns that are the same in ascii and utf8, as well as various other encodings.
Last unicode issue I had was on a system german characters, because some library assumed it had to explicitly perform encoding with a bad default setting. If the library didn't try to be smart the program would have worked independently of system or language, instead it failed on any non english system by trying to convert a perfectly fine, system specific encoding to utf8.
FWIW Win10 was affected by that same horrific SMB RCE vuln. So the implied argument that 10 has been immune or even much less vulnerable to ransomware over the past year or two is on shaky ground, though I agree it probably will start to have some merit going forward.
I suppose unless someone either forks it or keeps delivering patches outside the Python project. That wasn't really an option for Windows XP, but I'm quite sure if it had been then someone would be doing it.
Not actually the case. UTF-8, using only SPECIFIC operations that don't try to split up strings or replace things that aren't exact matches for a given text, will result in valid output as long as there was valid input.
All interchanged Unicode text should be UTF-8, never use another encoding* (without a really compelling reason).
No, storing it as an array of unicode characters isn't a compelling reason during interchange.
ALSO, never use a BOM; that will break things.
The second answer (should be anchor-linked) in this goes over MOST of the advantages of UTF-8, but it doesn't capture that some carefully input operations in otherwise completely Unicode //unaware// 'string' functions result in no change to string validity.
The only potential issue is if recognizing something in different normalization representations is important. However, for nearly all quick and dirty tasks (where a short script is most likely) it usually doesn't matter. For everything else a different paradigm than the one Python3 picked would be better. (One where adding filters to a read file is OPTIONAL and they can be invoked on individual byte-strings as well.)
Python 3 might occasionally require some extra steps when consuming strings compared to Python 2, but the reality is that those steps were always necessary. Python 2 just hid those details in a way that was only really safe for english-exclusive development. That doesn't mean that Python 2 is easier to use or less brittle. In fact I would say it means the opposite.