|
|
|
|
|
by sp332
5878 days ago
|
|
>>But UTF-8 has a dark side, a single character can take up anywhere between one to six bytes to represent in binary. >What? No! UTF-8 takes, at most, 4 bytes per code point. I thought each half of a UTF-16 surrogate pair used 3 bytes in UTF-8, but it turns out that this is an incompatible modification of UTF-8 called CESU-8. http://en.wikipedia.org/wiki/CESU-8 |
|