|
|
|
|
|
by loeg
2810 days ago
|
|
UTF-8 is ascii-compatible. Everything with the low bit cleared (characters 0x00-0x7F) is represented identically to ASCII. All codepoints >= 0x80 are represented with multiple bytes with the high bit (0x80) set. UTF-8 is a very elegant construct for Unix-type C systems — you could basically reuse all your nul-terminated string APIs. |
|