Hacker News new | ask | show | jobs
by bobbyi_settv 4265 days ago
Are UTF-8 encoded Excel documents actually common? Do they even exist? I thought Excel used CP 1252 on English Windows and the corresponding code pages on other language versions?
3 comments

There are two types of formats generally recognized as XLS: Excel 5.0/95 "BIFF5" and Excel 97-2003 "BIFF8". The former uses a language-specific codepage like 1252 and the latter can use a language-specific codepage or the more general 1200 (UTF16LE).

Here is the master list of codepages used by Excel: https://github.com/SheetJS/js-codepage/blob/master/excel.csv (disclaimer: I built this as part of the in-browser XLS parser https://github.com/SheetJS/js-xls)

I'm pretty sure that xlrd decodes it all to unicode() in Python, so that should be a moot point. You would only need to worry about passing it as utf-8 to Vim at that point.
How would it save a document containing multiple languages, then?
Excel 97-2003 (XLS) actually uses UTF16LE in that case, not UTF8. Excel 2007+ XLSB exclusively uses UTF16LE -- there is no way to force it to use a codepage