Hacker News new | ask | show | jobs
by sheetjs 1753 days ago
Pre-Unicode issues still haunt us today, kept alive by various file formats that rely on system encoding.

Under the Apple "Mac-Roman" encoding [1], the standard MacOS encoding before OSX switched to Unicode, byte 0xBD currently is capital omega (U+03A9 Ω). However, in the original 1994 release of the character set, they erroneously mapped to the ohm sign (U+2126 Ω) Apple eventually fixed this in 1997, as noted in the changelog:

    #       n04  1997-Dec-01    Update to match internal utom<n3>, ufrm<n22>:
    #                           Change standard mapping for 0xBD from U+2126
    #                           to its canonical decomposition, U+03A9.

However, in 1996, Microsoft copied over the mac encoding to CP10000 using the incorrect character [2]. Unfortunately the codepage was not corrected when Apple realized their mistake.

This discrepancy leads to a huge number of strange issues with various versions of Excel for Mac (BOM-less CSV, SYLK and other plaintext formats default to system encoding) and other software that use Microsoft's interpretation of Apple's Mac-Roman encoding rather than Apple's official character set mapping.

[1] http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.T...

[2] http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/MAC/RO...