|
|
|
|
|
by jrabone
4948 days ago
|
|
It is totally arbitrary - there's no reason you can't have degenerate 6-byte encodings, and compliant decoders should cope with them. See Marcus Kuhn's excellent UTF-8 decoder torture test page linked elsewhere in this thread. |
|
Note that Kuhn's torture test page deliberately includes a lot of invalid sequences in order to make sure that they're gracefully handled. Section 4 is dedicated to degenerate encodings like this.