Hacker News new | ask | show | jobs
by kissiel 1400 days ago
What the hell is this site's encoding? However I try to decode it, I get misdecoded 'ü'.
3 comments

Originally UTF-8, then decoded as Latin-1, then re-encoded to UTF-8. You can recompute the exact bytes in the page with this Python code:

    >>> 'Zürich'.encode('utf-8').decode('latin1').encode('utf-8')
    b'Z\xc3\x83\xc2\xbcrich'
I think it's a double-encoding error. Text was encoded into utf-8, whose bytes got misinterpreted as codepoints, and encoded again. ü in utf-8 is (c3 bc), and U+00c3, U+00bc are Ã, ¼ respectively.

    $ echo 'Zürich' | iconv -f latin1 -t utf-8

    Zürich
FTA: “When ETH Zürich started its own computer science program in the 60s, buying computers from the US turned out to be a bit of an issue. They were expensive and often unsuitable for European use (what with our strange umlauts and stubborn insistence on speaking languages different from English)”

I guess they started buying computers from the US :-)