| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sp332 5925 days ago

>>But UTF-8 has a dark side, a single character can take up anywhere between one to six bytes to represent in binary.

>What? No! UTF-8 takes, at most, 4 bytes per code point.

I thought each half of a UTF-16 surrogate pair used 3 bytes in UTF-8, but it turns out that this is an incompatible modification of UTF-8 called CESU-8. http://en.wikipedia.org/wiki/CESU-8