Hacker News new | ask | show | jobs
by sp332 5878 days ago
>>But UTF-8 has a dark side, a single character can take up anywhere between one to six bytes to represent in binary.

>What? No! UTF-8 takes, at most, 4 bytes per code point.

I thought each half of a UTF-16 surrogate pair used 3 bytes in UTF-8, but it turns out that this is an incompatible modification of UTF-8 called CESU-8. http://en.wikipedia.org/wiki/CESU-8