Hacker News new | ask | show | jobs
by the_mitsuhiko 4087 days ago
> Doing any non-ASCII string-processing in Python 2 with any regularity is a more than regular-enough beating for me.

Can you elaborate on this? Unicode processing is the same on 2.x and 3.x for the most part. There are some differences in interpreter internals, how string literals are represented and the internal representation was changed (and obviously the literal defaults and bytestrings were removed), but other than that the unicode support is more or less equivalent.

1 comments

The big issue with Python2 is that it's easy to accidentally mix unicode and byte strings during development, and it works fine until some user has a non-ascii home directory or similar. You get bugtracker conversations like "it crashes on XP but works on Vista", "I can't find this traceback file you mentioned", "what is a file system encoding?", etc.

Or you write a logfile parser and it works great for half a year, until in March where you get an UnicodeEncodeError because March is "März" in German, the only month with an Umlaut.