|
|
|
|
|
by account42
2458 days ago
|
|
Then concatenating to valid Unicode documents would no longer be valid Unicode. That is bad.
And ASCII text would no longer be a valid UTF-8 encoded Unicode document. That is bad.
And even when everything has finally switched to UTF-8 every tool ever will still need to handle the BOM. That is bad. Guessing between valid UTF-8 and Latin-1 is only ever ambiguous when there are multiple non-ASCII characters in a row and all those sequences are made up of a lead byte with the correct number of trailing bytes. How often is that a problem for you in practice? |
|