Hacker News new | ask | show | jobs
by TillE 1356 days ago
I've dealt with two elusive bugs which were ultimately caused by Windows stupidly using UTF-8 with BOM by default. Python requires you to take extra steps to decode that garbage, and some C++ libraries can't handle it at all.

I'm sure there were good reasons that BOM sounded like the right idea at Microsoft, but everyone else just used straight UTF-8 and it was fine.

1 comments

Windows supported Unicode in 1993 (NT 3.1) and 1995 (Win95) via UCS-2, a fixed-width 16-bit encoding.

In 1996, it was realized 16-bit wasn't enough, and was expanded in Unicode 2.0, which also included UTF-16, a variable-width encoding, which required the BOM.

Windows 2000 supported UTF-16 on release.

Why didn't Windows 2000 support UTF-8, which was invented in 1992 and implemented in Plan9 in that same year? Who can say...