|
|
|
|
|
by ynik
2178 days ago
|
|
The bytestring was truncated after 32 bytes, in the middle of a UTF-8 byte sequence.
This means the resulting truncated string is not valid UTF-8 anymore.
So my guess is that most devices decide "if it's not valid UTF-8, it must $LEGACY_ENCODING". |
|
The other is for any code unit that won't decode you emit U+FFFD the Unicode Replacement Character and then you carry on decoding.
For humans U+FFFD makes it obvious something is wrong, it's typically visualised as a black diamond with a white question mark. And for a machine it shouldn't match parsing rules, it isn't an alphanumeric, it isn't any of the common separator or spacing characters, so it's unlikely to be of use in an attack.