| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rmunn 319 days ago
	That's not what I'm saying at all, I'm saying that in the absence of a BOM header a Unicode-aware app should guess UTF-8 first and then guess other likely encodings second, because the chance of false positives on the "is this UTF-8?" guess is practically indistinguishable from zero. If it isn't UTF-8, the UTF-8 parsing attempt is nearly guaranteed to fail, so it's safe to do first. I'm also saying that apps should not create a BOM header any more (in UTF-8 only, not in UTF-16 where it's required), because the costs of dealing with BOM headers are higher than they're worth. Except in certain specific circumstances, like having to deal with pre-Unicode apps that default to assuming 8-bit encodings.

1 comments

mikelabatt 318 days ago

Makes sense, thank you. The observation about false positives for UTF-8 tending to zero helps understand. So I will vote for UTF-8 without BOM from now on (while encouraging parsers to deal with it, if present).