well.... China's favorite encoding is GB, which encodes ascii values as one byte and chinese characters as two bytes. It's hard to see how UTF-8 (one byte for ascii, three for characters) would beat that, on the assumption that nearly 100% of what a chinese website would want to transmit is either ascii (where UTF-8 is equivalent to GB) or chinese (where it's inferior).
How do I know GB is preferred? I'm going off of three things:
Okay, you are right. The comparison I was thinking of was actually about UTF-16, and of course that's not actually preferred to GB or Shift-JIS or whatever.
You are tremendously wrong. Almost every legacy CJK encoding encodes a string in the smaller number of bytes than UTF-8 when the string in question has no unsupported characters in it. I have seen lots of (mostly misguided) people who prefer those legacy encodings over UTF-8/16 solely for this reason.
How do I know GB is preferred? I'm going off of three things:
- According to wikipedia (http://en.wikipedia.org/wiki/GB18030), software sold in China is legally required to support it.
- I was once given a chinese ebook, which I had to figure out was in GB before I could read it. (And now, I know about chardet!)
- I worked with a chinese programmer who accidentally committed files in GB, even though they were supposed to be in UTF-8.
And since the latest GB can in fact represent any unicode point, it's hard to see why it wouldn't be preferred indefinitely.