|
|
|
|
|
by dbaupp
3371 days ago
|
|
> A zero-byte in a utf-8 string is the null-terminator and utf-8 strings can be treated exactly like C strings in terms of where the string ends. No, the parent was correct: UTF-8 encodes NUL (i.e. \0) as a single zero byte (e.g. in contrast, Modified UTF-8[1] uses an overlong for NUL, so there's never any possibility of an internal zero). Of course, an application/library can choose to restrict itself to only handling UTF-8 that doesn't contain internal NULs, but the spec itself allows for zero bytes in a string. [1]: https://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8 |
|
By definition, with a null-terminated string, NUL is the terminator.
If you want to have strings that contain NUL, then by definition you can't use a null-terminated string.
This is true of utf-8 or regular C strings.