| HN Mirror

Then concatenating to valid Unicode documents would no longer be valid Unicode. That is bad. And ASCII text would no longer be a valid UTF-8 encoded Unicode document. That is bad. And even when everything has finally switched to UTF-8 every tool ever will still need to handle the BOM. That is bad.

Guessing between valid UTF-8 and Latin-1 is only ever ambiguous when there are multiple non-ASCII characters in a row and all those sequences are made up of a lead byte with the correct number of trailing bytes. How often is that a problem for you in practice?