|
|
|
|
|
by gpderetta
1109 days ago
|
|
As a bit of history, Windows NT 3.1 was published in summer '93, and it was the first Windows version with Unicode support (UCS-2, not UTF-16 that didn't exist yet). Presumably development started well before that. UTF-8 was publicly presented at USENIX at the beginning of '93. Not sure when Unicode incorporated it. It is unlikely that Windows would have been changed at the last minute to use it, especially as the variable encoding of UTF-8 was significantly more complicated than the fixed size UCS-2. |
|
If only UTF-8 had been invented a little earlier, we could have avoided so much pain :-(
The idea of global varables like LANG= and LC_TYPE= in C is utterly incoherent.
Python's notion of "default file system encoding" is likewise incoherent.
You can obviously process strings with two different encodings in the same program !!! Encodings are metadata, and metadata should be attached to data. Encodings shouldn't be global variables!
Python 3 made things worse in many ways, largely due to adherence to Windows legacy, and then finally introduced UTF-8 mode:
https://vstinner.github.io/painful-history-python-filesystem...