|
|
|
|
|
by imron
3371 days ago
|
|
We are in agreement that the only time a single zero byte can be found in well-formed utf-8 is for the NUL character. By definition, with a null-terminated string, NUL is the terminator. If you want to have strings that contain NUL, then by definition you can't use a null-terminated string. This is true of utf-8 or regular C strings. |
|
If someone passes you a text file that is verified to be valid UTF-8 and contains, say, access permissions, then you better not stop parsing it at the first '\0' character.
None of this is a huge problem, but it's something to be aware of. C string handling is incompatible with UTF-8.