Hacker News new | ask | show | jobs
by MickerNews 2048 days ago
It shouldn't need a BOM. Besides, UTF-8 doesn't need a BOM at all.
1 comments

It doesn't, but only if you assume all text is utf-8. You can't do that on windows because there's plenty of legacy files that use encoding like windows-1252 which are incompatible with utf-8.
Start by converting them to UTF-8. As a metaphor : would you want to work with files that can only be opened properly in, say, Lotus 1-2-3 ?
The time to drop that support is long passed. UTF-8 has won, the value of pandering to these ancient codepages versus the cost of choking and displaying mojibake just because someone dared to add a kanji character to a CSV should be clear by now.
UTF-8 has won on the web and file storage. For programming (C#, probably Java too, the Windows API) it’s UTF-16. Good point is it’s an Unicode encoding, bad point is that is variable-size, which look fixed-size for common characters.
> For programming (C#, probably Java too, the Windows API) it’s UTF-16.

For programming it's UTF8, except for Windows-centric developers (and Microsoft technologies).

Java uses WTF16 internally but rarely assumes an external charset, and when it does, older API tend to use the default charset (which is generally ascii-compatible at best: on western windows it's commonly windows-1252, though it's been UTF-8 for years on most unices). Newer APIs like Files.newBufferedReader(Path) straight go with UTF-8.

And what happens to all the legacy office versions out there, or all the legacy csvs floating around? What you described works fine on a rolling release distro, but would be unacceptable for enterprise software. The current approach (opt in utf-8 with BOM to disambiguate between utf-8 and non-utf8) files is the best option that doesn't piss off their customers.