Hacker News new | ask | show | jobs
by rspeer 3573 days ago
I have never seen anyone with an Excel workflow that involves exporting to CSV that does not immediately mojibake every non-ASCII character.

Yeah, I know there's a well-hidden option to export in tab-separated UTF-16. Nobody uses it.

So I guess you're fine if your data is just numbers, and if it's not labeled with words, names, or places.

4 comments

Excel also messes with CSV values that look like integers but aren't (think serial numbers or IDs), leading to many frustrating days of support.
Not to mention the famous mangling of gene names that look like dates. Yes, I know scientists shouldn't be using Excel -- I agree. But I'm a computational biologist. Experimentalists know how to use exactly three programs 1) Microsoft Word 2) Microsoft Excel 3) Microsoft Powerpoint.
I find myself working with four letter codes to identify wards and facilities at my work, inevitably I have to compare my SQL output to someone's spreadsheet. No excel, I didn't mean March 02, just give me MAR2 like I asked. Drug codes that start with a '.' are another headache, they will be auto formatted as a decimal number. Worse is when they want to put info from a spreadsheet in to the database, no one understands the need for consistent field formatting :'(
I feel your pain. We have a lot of ID values with leading zeroes, which Excel loves to strip, and data with numbers in parentheses, which Excel's CSV loader thinks should be negative numbers (this is apparently common in book-keeping).

I keep getting sent spreadsheets which were made from CSV data by people who didn't know how to set the columns to text - to be fair, if the file extension is .csv then Excel won't even let them set the column formats - and having this latter category turn up as -ve numbers not only looks weird but ruins the sorting. Even 1, 11, 2 is better than sorting "backwards".

Have you tried bringing in with PowerQuery instead? Much more flexible.
I haven't tried pulling Excel -> SQL with PowerQuery yet, only really dabbled in presenting data the other way. Do you know if this works easily with a document that utilises a lot of macros and VBA? Or are we still talking flat CSV.
What are you talking, Excel with VBA is importing/consuming the data, or it is the datasource for something else?

I assume Excel with VBA is importing/consuming the data in which case Power Query / Power Pivot would likely solve most of your problems. Refactoring a large complicated existing spreadsheet to use PQ rather than your existing peocess, hard to say how tough that would be.

i was reading a paper about how Excel gets some standard statistical tests wrong too - the US Stats Association i think did a compare/contrast with a range of packages. some of these bugs have been there since 2002 or so. i lived in Excel for many years but wasn't a power user - i got to like it but sadly, it got used for everything even when it really, really shouldn't have.
I read an article on that, but it was clearly mentioned that this was an issue with older versions of excel (excel 03 07? iirc).
this link seems quality and you're right - seems like the excel team took it seriously in... 2010. that's still pretty bad!

http://www.practicalstats.com/xlsstats/excelstats.html

There is also a pretty gnarly bug in excel when it tries to import a CSV with the asci string "ID" as the first two characters.

http://superuser.com/questions/210027/why-does-excel-think-c...

Very Frustrating

Hacker news discussion on that bug:

https://news.ycombinator.com/item?id=12041210

Oh this was fun.

Excel has issues with numerical figures greater than 13 digits, and can't distinguish from a serial number in a csv or an integer.

Importing a small (30k rows of so) set of of tax information thus became hell.

Worst part is that I didn't find it till halfway through the migration. I love excel on average, but that was just a painful fight.

Often, you can strip the BOM (which isn't supposed to be there to begin with), convert from excel's proprietary character set to UTF-8, and translate the line endings and things may work - if it hasn't decided to randomly change things that weren't dates into dates, strip leading 0's from your zip codes, reformat things that you didn't want reformatted, and otherwise mangle your data. (Nevermind that Excel can't handle dates properly anyway.)

What's worse is that many people play with excel and use things like coloring, fonts, and visual layout to distinguish things, and/or combine multiple unrelated chunks into one sheet so that they won't have to switch tabs (which of course makes for a scrambled mess of data). I try to suppress the memories of it taking 36 tries to get a usable CSV from business people (after having given clear instructions in advance and doing postprocessing afterward to clean it up).

Excel should never be used for working with data. But it is, so that's job security if you can clean up the messes that it makes.

> So I guess you're fine if your data is just numbers, and if it's not labeled with words, names, or places.

Whoops, busted. I very rarely deal with localised data so I can get away with ASCII. Excellent point, though, my case doesn't generalise as well as I thought it did.

I use LibreOffice just for this purpose. ie Creating csv files from xls files. Open Excel file in LibreOffice Save as text >> UTF-8 >> quote all text cells. Works perfectly every time, Excel is brilliant for most things but is useless with csv, all kinds of strange bugs will arise.