Hacker News new | ask | show | jobs
by MickerNews 2049 days ago
The time to drop that support is long passed. UTF-8 has won, the value of pandering to these ancient codepages versus the cost of choking and displaying mojibake just because someone dared to add a kanji character to a CSV should be clear by now.
2 comments

UTF-8 has won on the web and file storage. For programming (C#, probably Java too, the Windows API) it’s UTF-16. Good point is it’s an Unicode encoding, bad point is that is variable-size, which look fixed-size for common characters.
> For programming (C#, probably Java too, the Windows API) it’s UTF-16.

For programming it's UTF8, except for Windows-centric developers (and Microsoft technologies).

Java uses WTF16 internally but rarely assumes an external charset, and when it does, older API tend to use the default charset (which is generally ascii-compatible at best: on western windows it's commonly windows-1252, though it's been UTF-8 for years on most unices). Newer APIs like Files.newBufferedReader(Path) straight go with UTF-8.

And what happens to all the legacy office versions out there, or all the legacy csvs floating around? What you described works fine on a rolling release distro, but would be unacceptable for enterprise software. The current approach (opt in utf-8 with BOM to disambiguate between utf-8 and non-utf8) files is the best option that doesn't piss off their customers.