Hacker News new | ask | show | jobs
by zarzavat 682 days ago
Just use TSV. Commas are a terrible delimiter because many human strings have commas in them. This means that CSV needs quoting of fields and nobody can agree on how exactly that should work.

TSV doesn’t have this problem. It can represent any string that doesn’t have either a tab or a newline, which is many more than CSV can.

4 comments

It's 2024 and Excel still doesn't natively parse CSV with tabs as delimiters. When I send such csv files to my colleagues, they complain about not being able to open them directly in Excel. I wish Excel could pop up a window like LibreOffice does to confirm the delimiter before opening a csv file.
Excel does not support any delimeter natively since its region dependent.

I ended up saving my mental heath by supporting two different formats: "RFC csv" and "Excel csv". On excel you can for example use sep=# hint on beginning of file to get delimeter work consistently. Sep annotation obviously break parsing for every other csv parser but thats why there is other format.

Also there might be other reasons too to mess up with file to get it open correctly on excel. Like date formats or adding BOM to get it recognized as utf-8 etc. (Not quite sure was BOM case with excel or was it on some other software we used to work with )

I also use sep= annotation. That is not documented ANYWHERE by Microsoft I assume one of the devs mentioned this in a mailing-list sometime in the nineties and it has found its way around.

Still... Shame on Microsoft of not documenting this and perhaps other annotations that one can use for ex El.

I am quite sure that Excel import option has tabs as delimeters option.

https://support.microsoft.com/en-us/office/import-or-export-...

https://support.microsoft.com/en-us/office/text-import-wizar...

"Delimiters Select the character that separates values in your text file. If the character is not listed, select the Other check box, and then type the character in the box that contains the cursor."

Maybe they should know better their tools instead of plain double clicking and hope for the best.

It's 2024 and people still haven't realized that Excel does not and never will support opening CSV files. The closest thing it allows you to do is import data from a CSV file into your current spreadsheet, but open a CSV file? It will never do that. Stop using CSV for excel, just generate .xlsx files like everyone else.
I can double click on a csv and it opens pretty cleanly in Excel. I don’t use it systematically but I usually eyeball csvs using Excel.
Not in every version. I recently found out that Excel doesn't recognize commas as separators in a comma-separated-values file on my coworkers PCs.

I presume it's because Germany uses the comma as a decimal separator instead of a dot.

I eventually settled on just exporting Excel because I couldn't get both the encoding and separator to work at the same time.

Another fun story is that a coworker lost data, when they opened a csv, wrote data to a second sheet, and then saved it. A sane program would probably have brought up a save-as window. Excel didn't. It just discarded the second sheet.

I don’t like exporting excel because that’s harder to read. I don’t make the data for Excel. I make the data portable.

The fact that it opens in Excel is just convenient. If Excel doesn’t work, I’d use BBEdit or one of a billion other clients that read CSV.

Far fewer clients can open Excel files. Especially 10 or 100 years in the future.

If Excel is set to handle the extension .csv then attempting to open a .csv file correctly launches Excel and imports it. File for read only, but if you want it back out you have force matters, it's not automatic.
Excel is the best tool out there but it has its quirks.

For example the web version doesn't have a dark mode. Google sheets and docs these days is more useful and feature rich than Excel.

Feature-wise, Excel probably still has more options, but in terms of ergonomics, Google Sheets is much better. And I'm saying this as someone who has used Excel for 20 years.

Here are a few specific examples:

1. Editing formulas using the keyboard only is a nightmare in Excel. It often randomly throws errors and warnings when I move the cursor around (like typing parentheses or quotes first and then trying to move back to type text inside, etc.) before finishing editing.

2. Conditional formatting in Excel is so non-intuitive that I actively try to avoid it like the plague. Yet, I use it extensively in Google Sheets because it is so easy to create multiple rules there.

3. The whole copy/paste design choice in Excel is, in my opinion, weird. Firstly, there is a distinction between copying a cell and copying text: if you copy an entire cell, you cannot paste it as text in a formula or any other input area. You have to copy from the formula bar of that cell. Even for pure cell copying, the cells have to remain highlighted. If you copy a cell and then unselect it (by pressing Esc or trying to edit any cells), the copied content is lost. I'm sure there are reasons it's designed this way, but it's so irritating, and I never find any benefit.

Number 1 has a simple solution, pressing F2 is your friend.
Will try it. But it shouldn't need that in the first place. It's just bad UX
Google Sheets indeed does everything a common user would expect from a spreadsheet, without having to install anything and fiddle with licenses. This in itself is the killer feature.

For me personally the absolute killer feature is the bidirectional integration with BigQuery, something you won’t get that easily (if at all, correct me if I am wrong) with Excel.

As long time Office user since Windows 3.1 days, Google sheets has a lot of terrain to cover up.

Quattro Pro could easily do all of the sheets stuff but document collaboration, ignoring the fact that sheets is a Web application and Quattro Pro started on MS-DOS.

The one flaw I do see with them is blank cells

If you go with the CSV convention of two adjacent tabs => blank cell in the middle, then rows of different length will not line up properly in most text editors. And "different length" depends on the client's tab width too

If you allow any amount of tabs between columns, then you need a special way to signify an actually-blank column. And escaping for when you want to quote that

If you say "use tabs for columns and spaces for alignment", then you've got to trim all values, which may not be desirable

You’re talking about issues with alignment when data is displayed on a terminal or text editor, which is not at all related to data exchange.

In data exchange nobody ever allows multiple tabs between columns. If there are multiple tabs with nothing in between it means the column is empty for that row.

Just like with CSV, TSV, is always a pain to edit manually so the issues there are the same. Using tabs does have a lower likelihood of conflicting with the actual data.

This is true, but I'd assumed that one of the major reasons to use TSV is for human readability. If not, then I'd personally choose an even rarer character as my delimiter
That's what I always use when I need to write out some tabular data. Haven't had any problem importing them into anything.
In that case, why not use "|" (pipe character)?
Tabs are rarer than pipes. They are the rarest displayable character for human strings excluding code.

The best would be to use ASCII separator control characters but nobody uses that format so TSV it is.

Based on what data?