|
|
|
|
|
by mehrdadn
2867 days ago
|
|
I don't see anything wrong with what you're saying, but I still don't get how it explains the original comment I replied to [1]: > I'm not at all convinced that 2^21 codepoints will be enough, so someday it'd be nice to be able to get past UTF-16 and move to UTF-8 UTF-16 currently uses up to 2 16-bit code units per code point, whereas UTF-8 uses up to 4 8-bit code units per code point, and the latter wastes more bits for continuation than the former. How is "getting past UTF-16 and moving to UTF-8" supposed to increase the number of code points we can represent, as claimed above? If anything, UTF-16 wastes fewer bits in the current maximum number of code units, so it should have more room for expansion without increasing the number of code units. [1] https://news.ycombinator.com/item?id=17771351 |
|
And as you can see, if you do work out the bits, you find that cryptonector is wrong, since UTF-8 (as it has been standardized from almost the start of the 21st century, and as codecs in the real world have taken to implementing it since) encodes no more bits than UTF-16 does. It's 21 bits for both.