Hacker News new | ask | show | jobs
by thaumasiotes 4665 days ago
well.... China's favorite encoding is GB, which encodes ascii values as one byte and chinese characters as two bytes. It's hard to see how UTF-8 (one byte for ascii, three for characters) would beat that, on the assumption that nearly 100% of what a chinese website would want to transmit is either ascii (where UTF-8 is equivalent to GB) or chinese (where it's inferior).

How do I know GB is preferred? I'm going off of three things:

- According to wikipedia (http://en.wikipedia.org/wiki/GB18030), software sold in China is legally required to support it.

- I was once given a chinese ebook, which I had to figure out was in GB before I could read it. (And now, I know about chardet!)

- I worked with a chinese programmer who accidentally committed files in GB, even though they were supposed to be in UTF-8.

And since the latest GB can in fact represent any unicode point, it's hard to see why it wouldn't be preferred indefinitely.

1 comments

Okay, you are right. The comparison I was thinking of was actually about UTF-16, and of course that's not actually preferred to GB or Shift-JIS or whatever.