|
|
|
|
|
by Skalman
4001 days ago
|
|
It's an encoding that isn't good at anything: it's neither ASCII-compatible (like UTF-8), nor fixed-length (like UTF-32), but because most characters require only 2 bytes, developers frequently assume that none require more, leading to bugs when a character eventually is represented by 4 bytes. |
|
Utf-32 is only fixed length if you don't care about diacritics, variation selectors, RTL languages, and others. Unicode is not one code point or one char/wchar/uint32 per glyph.