|
|
|
|
|
by nuc1e0n
780 days ago
|
|
Codepoints can only be 1 to 4 utf-8 bytes. Utf-8's bit pattern can extend up to 6 bytes, but there are only 1,114,111 valid unicode codepoints. and U+10FFFF takes 4 bytes to encode in utf-8 in a not overlong form. I guess you could encode it overlong, but utf-8 should only be encoded not overlong, so anything else could be considered invalid and potentially harmful. |
|