| > The fact that they went out of their way to break python 2 unicode when running on python 3 was just totally nuts. Especially after making such a big deal about unicode! Imo it's infinitely worse than that. The big deal about Unicode is its nature, as defined in the "Summary Narrative" from 1991[0]. To wit: > The Unicode character encoding derives its name from three main goals: * universal (addressing the needs of world languages) * uniform (fixed-width codes for efficient access), and * unique (bit sequence has only one interpretation into character codes) The Unicode folk realized that it would take decades to shift developers worldwide to doing that properly, so they adopted a three stage plan for software (eg the string types of programming languages) to get from where things were, to where they needed to be: * Stage #1: Character = byte * Stage #2: Character = code point * Stage #3: Character = what a user thinks of as a character[1] Python 1 was a Stage #1 language -- Character = byte -- like most others of its time. In Python 2 there were tweaks to try move toward Stage #2 -- Character = code point, again, like most other PLs of its time. In Python 3, they dictated a full switch to Stage #2 --- Character = code point. That was an unnecessarily painful break relative to Python 2. But -- and this is what really matters -- they entirely ignored Stage #3, which is the whole point of Unicode in the final analysis. [0] https://www.unicode.org/history/summary.html [1] https://unicode.org/glossary/#grapheme |