Hacker News new | ask | show | jobs
by ryzvonusef 1045 days ago
Why does Excel autoformat data from a CSV? what is the rationale behind this? How hard it is to have an option that says "open with no formatting"?
2 comments

You can import the CSV file instead of opening it, which lets you choose the data type for each column. It's clunky, but it does what you want (if you set the types correctly).

Excel is a known issue for many text data files. My least favorite Excel-ism is when it changes the names of genes to dates. Many genes recently changed names to avoid Excel-related issues. As an example: OCT4 became POU5F1, and SEPT1 became SEPTIN1. Otherwise, if you're looking at gene related data, you'd have to be very careful when looking at gene names that you didn't accidentally save the file with a few "dates" as opposed to gene names. (Which is even more confusing when you realize that dates are stored as integers in Excel).

I was doing a coursera project and it kept raising duplicates from what I thought were random strings of numbers and characters and therefore should be unique, took me a while to figure out what was happening:

https://sites.google.com/view/ryzvonusef/process

Wasted a week's worth of effort, but I learnt a valuable lesson.

Spoiler: Data read into Excel/Google sheets was interpreted as scientific notation, this caused uniqueness errors.
My least favorite Excel-ism is when it changes correct dates to american dates.

https://xkcd.com/1179/

Depending on whether you open a CSV file with Excel or import it into it, different things work and don't work. This has been a problem especially on a Windows with a locale that uses a different separator, e.g. semicolon instead of comma on a German Windows because the comma is used in numbers as decimal separator.

Unicode encoding or rather anything non-ASCII is also prone to cause trouble. The automatic conversion of quoted(!) digit strings to numbers can be worked around by appending a tab character. Line breaks within a cell are a big headache, however, and I believe the U+2028 code point can still not be used instead.