|
|
|
|
|
by oofabz
4545 days ago
|
|
I think the size issue is a red herring. UTF-8 wins some, UTF-16 wins others, but either encoding is acceptable. There is no clear winner here so we should look at other properties. UTF-8 is more reliable, because mishandling variable-length characters is more obvious. In UTF-16 it's easy to write something that works with the BMP and call it good enough. Even worse, you may not even know it fails above the BMP, because those characters are so rare you might never test with them. But in UTF-8, if you screw up multi-byte characters, any non-ASCII character will trigger the bug, and you will fix your code more quickly. Also, UTF-8 does not suffer from endianness issues like UTF-16 does. Few people use the BOM and no one likes it. And most importantly, UTF-8 is compatible with ASCII. |
|
UTF-32 is probably what you're thinking of.