| HN Mirror

> OTOH: Which characters do and don't belong in unicode and in what order? I don't fucking know. :-)

Should we use decimalized time or time based on the Babylonian base 60/12 system? Both have clear advantages. I don't fucking know. :-)

The world has standardized on Unicode, which (as a collection of expanding standards) defines the set of valid code points and their order. There's still some debate as to UTF-8 vs. UTF-16LE (and perhaps UTF-16 w/BOM and UTF-32) encodings, but Unicode has clearly won. It's not perfect, but it's silly to pretend Unicode hasn't won.

Source: I used to work as an engineer on the content converter portion of Google's indexing system, which took the world's web pages, PDFs, etc. and converted them into a unified format (the text portion of which is encoded as UTF-8) for the rest of the indexing system. Sure, we saw some percentage of EUC-KR, GB2312, Big5, and Win CP1252 text, but Unicode has clearly won and UTF-8 and UTF-16LE are steadily replacing all other encodings.