Hacker News new | ask | show | jobs
by wvenable 2393 days ago
Windows and many other operating systems and languages (Java) got on board with Unicode back when the character set would fit in 16bits. The character set originally used was UCS-2 (not UTF-16). UTF-16 came next to extend the Unicode character set beyond 65536 code points.

UTF-8 wasn't even invented until well after all these operating systems and languages deployed Unicode.

They didn't see the light of day to use UTF-8 because they didn't have a time machine to make that possible.

1 comments

I actually checked a while ago when UTF-8 was created, and it was just around the same time when Windows NT was developed with 16-bit "early" Unicode support. UTF-8 was created in September 1992 [1], and Windows NT came out mid 1993, but I guess it was too late for Windows to change to UTF-8 (and I guess the advantages of UTF-8 haven't been as clear back then).

But IMHO there's no excuse to not use UTF-8 after around 1995 ;)

[1] https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt

Also, UTF-16 was only published in July 1996 (although the need for more than 16 bits was probably apparent a bit earlier). So before that, Unicode was only a 16-bit encoding, and UCS-2 was enough. UTF-8 was initially just a nice trick to keep using ASCII characters for things like directory separators (/) and single-byte NUL terminators. By 1995 its superiority certainly wasn't apparent yet.

Also, Windows internals were completely 16-bit-character based, including e.g. the NTFS disk format, so by 1992 that was already quite hard to change.

That said, it is crazy that NT didn't have full UTF-8 support, including in console windows, by about 2000.