Hacker News new | ask | show | jobs
by dguaraglia 3207 days ago
My favorite story about Python's handling of Unicode was when one of my coworkers did a hotfix for our Python website, wrote tests, confirmed everything worked as expected... but right before committing and pushing to production wrote a comment like:

# Apparently we expect the field to be in this format ¯\_(ツ)_/¯

Right above the code he'd just fixed.

Of course, the moment we pushed the update it brought production down, because the Python interpreter doesn't understand Unicode in source files unless you specify which encoding you are using.

After that, "¯\_(ツ)_/¯" became a synonym for his name on our HipChat server, heh.

1 comments

This would be the case in Python 2, where source code files are assumed to be ASCII-encoded unless there's an encoding comment at the top of the file.

In Python 3, source code files are assumed to be UTF-8.

Interesting that Python 2 couldn't fix that in a hotfix/point release... UTF-8 is backwards compatible with ASCII so it shouldn't break anything if source started being interpreted as UTF8. I'd be curious to see what their reasoning is.
I would imagine Python's approach to introducing new language features had a lot to do with it. Having to go through the PEP system takes some time, and changes like these tend to be reserved for minor-version releases. All in all, I love the PEP system, it's such an open concept and I've been surprised by the amount of quality proposals that get implemented. Wish Go had something like it.
The change to UTF-8 source encoding also changed the legal set of characters for identifiers, and specified how to normalize them. Which in turn is the reason behind this thing I posted on Twitter a while back:

https://gist.github.com/ubernostrum/b7b705bf21b86a1b5c1e2c9f...

And also is a big enough change to not really be something that could happen in Python 2.

Correct, this was a codebase that still had some Pylons (gasp! Not even Pyramid, but legit Pylons) code.