Hacker News new | ask | show | jobs
by ubernostrum 3208 days ago
This would be the case in Python 2, where source code files are assumed to be ASCII-encoded unless there's an encoding comment at the top of the file.

In Python 3, source code files are assumed to be UTF-8.

2 comments

Interesting that Python 2 couldn't fix that in a hotfix/point release... UTF-8 is backwards compatible with ASCII so it shouldn't break anything if source started being interpreted as UTF8. I'd be curious to see what their reasoning is.
I would imagine Python's approach to introducing new language features had a lot to do with it. Having to go through the PEP system takes some time, and changes like these tend to be reserved for minor-version releases. All in all, I love the PEP system, it's such an open concept and I've been surprised by the amount of quality proposals that get implemented. Wish Go had something like it.
The change to UTF-8 source encoding also changed the legal set of characters for identifiers, and specified how to normalize them. Which in turn is the reason behind this thing I posted on Twitter a while back:

https://gist.github.com/ubernostrum/b7b705bf21b86a1b5c1e2c9f...

And also is a big enough change to not really be something that could happen in Python 2.

Correct, this was a codebase that still had some Pylons (gasp! Not even Pyramid, but legit Pylons) code.