| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MickerNews 2048 days ago
	It shouldn't need a BOM. Besides, UTF-8 doesn't need a BOM at all.

1 comments

gruez 2048 days ago

It doesn't, but only if you assume all text is utf-8. You can't do that on windows because there's plenty of legacy files that use encoding like windows-1252 which are incompatible with utf-8.

link

BlueTemplar 2048 days ago

Start by converting them to UTF-8. As a metaphor : would you want to work with files that can only be opened properly in, say, Lotus 1-2-3 ?

link

MickerNews 2048 days ago

The time to drop that support is long passed. UTF-8 has won, the value of pandering to these ancient codepages versus the cost of choking and displaying mojibake just because someone dared to add a kanji character to a CSV should be clear by now.

link

tasogare 2048 days ago

UTF-8 has won on the web and file storage. For programming (C#, probably Java too, the Windows API) it’s UTF-16. Good point is it’s an Unicode encoding, bad point is that is variable-size, which look fixed-size for common characters.

link

masklinn 2046 days ago

> For programming (C#, probably Java too, the Windows API) it’s UTF-16.

For programming it's UTF8, except for Windows-centric developers (and Microsoft technologies).

Java uses WTF16 internally but rarely assumes an external charset, and when it does, older API tend to use the default charset (which is generally ascii-compatible at best: on western windows it's commonly windows-1252, though it's been UTF-8 for years on most unices). Newer APIs like Files.newBufferedReader(Path) straight go with UTF-8.

link

gruez 2048 days ago

And what happens to all the legacy office versions out there, or all the legacy csvs floating around? What you described works fine on a rolling release distro, but would be unacceptable for enterprise software. The current approach (opt in utf-8 with BOM to disambiguate between utf-8 and non-utf8) files is the best option that doesn't piss off their customers.

link