| HN Mirror

UTF-16 is a variable-length encoding using up to two code units which each are 16-bits wide.

UTF-8 is a variable-length encoding using up to 4 code units (though it used to be up to 6, and could again be up to 6) each of which are 8-bits wide.

Both, UTF-16 and UTF-8 are variable-length encodings!

UTF-32 is not variable-length, but even so, the way Unicode works a character like ´ (á) can be written in two different ways, one of which requires one codepoint and one of which requires two (regardless of encoding), while ṻ (LATIN SMALL LETTER U WITH MACRON AND DIAERESIS) can be written in up to five different ways requiring from one to three different codepoints (regardless of encoding).

Not every character has a one-codepoint representation in Unicode, or at least not every character has a canonically-pre-composed one-codepoint representation in Unicode.

Therefore, many characters in Unicode can be expected to be written in multiple codepoints regardless of encoding. Therefore all programmers dealing with text need to be prepared for being unable to do an O(1) array index operation to get at the nth character of a string.

(In UTF-32 you can do an O(1) array index operation to get to the nth codepoint, not character, but one is usually only ever interested in getting the nth character.)