|
|
|
|
|
by jstimpfle
3782 days ago
|
|
My parent was speaking about indexing at the code points level, not at the encoding (byte / character) level. I do know that Unicode has combining code points (confusingly called combining characters) and nasty things like rtl switching code points. I guess it's turtles all the way down. |
|
You need UTF-32 for (random) indexing of code points. UTF-16 has 16-bit code units. Some UTF-16 code points are 32-bits, using a surrogate pair.
So it's the same trade-off as with UTF-8. Thus no reason not to just simply use UTF-8 in the first place and take advantage of the memory savings.