|
|
|
|
|
by slownews45
1763 days ago
|
|
The level on intentional compatibility breaking was crazy. The fact that they went out of their way to break python 2 unicode when running on python 3 was just totally nuts. Especially after making such a big deal about unicode! I've never seen anything like it I don't think? Maybe the new Perl that never really landed? |
|
Imo it's infinitely worse than that.
The big deal about Unicode is its nature, as defined in the "Summary Narrative" from 1991[0]. To wit:
> The Unicode character encoding derives its name from three main goals:
* universal (addressing the needs of world languages)
* uniform (fixed-width codes for efficient access), and
* unique (bit sequence has only one interpretation into character codes)
The Unicode folk realized that it would take decades to shift developers worldwide to doing that properly, so they adopted a three stage plan for software (eg the string types of programming languages) to get from where things were, to where they needed to be:
* Stage #1: Character = byte
* Stage #2: Character = code point
* Stage #3: Character = what a user thinks of as a character[1]
Python 1 was a Stage #1 language -- Character = byte -- like most others of its time.
In Python 2 there were tweaks to try move toward Stage #2 -- Character = code point, again, like most other PLs of its time.
In Python 3, they dictated a full switch to Stage #2 --- Character = code point. That was an unnecessarily painful break relative to Python 2. But -- and this is what really matters -- they entirely ignored Stage #3, which is the whole point of Unicode in the final analysis.
[0] https://www.unicode.org/history/summary.html
[1] https://unicode.org/glossary/#grapheme