Hacker News new | ask | show | jobs
by impure 692 days ago
I switched to TSV files for my app. None of my values contain tabs so I don't have to escape anything.
3 comments

I tend to prefer that over CSV as well. But usually I go for ndjson files since that's a bit more flexible for more complex data and easier to deal with when parsing. But it depends on the context what I use.

However, a good reason to use TSV/CSV is import/export in spread sheets is really easy. TSV used to have an obscure advantage: google sheets could export that but not CSV. They've since fixed that and you can do both now.

And of course, getting CSV out of a database is straightforward as well. Both databases and spreadsheets are of course tabular data; so the format is a good fit for that.

Spreadsheets are nice when you are dealing with non technical people. Makes it easier to involve them for editing / managing content. Also, a spread sheet is a great substitute for admin tools to edit this data. I once was on a project where we payed some poor freelancer to work on some convoluted tool to edit data. In the end, the customer hated it and we unceremoniously replaced that with a spreadsheet (my suggestion). Much easier to edit stuff with those. They loved it. The poor guy worked for months on that tool with the help of a lot of misguided UX, design and product management. It got super complicated and it was tedious to use. Complete waste of time. All they needed was a simple spreadsheet and some way to get the data inside deployed. They already knew how to use those so they were all over that.

If you have non technical people please for the love of god start using .xlsx directly.

Nobody on this planet wants to use e.g. Libre office to import your CSV file and save it as xslx so they can open it in Excel.

Excel can open, modify and save CSVs, in my experience some users won’t even notice they’re not editing a native excel file.
The ASCII specification defines characters for separating fields, groups, records, and files, but I've rarely seen them used.
That's because anyone can easily make a tab character with their keyboard. No one ever remembers the key combination for those special ascii characters.
If it became popular, all common editors would have an easy way to type them.
This encapsulates my problem with CSVs:

- If I send someone a spreadsheet, they'll open it with a spreadsheet application; Excel, LibreOffice, whatever.

- If I send someone a CSV file, they'll want to open it with a text editor.

Ack, no! Open it with a spreadsheet app, or load it into SQLite, or, best of all, open it with VisiData or some kind of editor designed for tabular data.

https://www.visidata.org/

Actually no - spreadsheets classically choose their own way to interpret CSVs, that's the classic way to get your client to continue to send you support requests.

There's a reason so many tools export to xls instead of csv.

And if you just double click on a csv file to open it in excel (rather than importing from excel), excel will happily corrupt your data. Trim leading zeros in IDs, round large integers, not spot the date column is in us vs european format, etc.
And if there were an easy way to type them, then people would start using them in strings.

And then they'd have to be escaped.

And we'd be back where we started.

While that's true, the way text editors handle these characters is not standardized, and many may not let you input them. One of the important features of CSV/TSV is that they're relatively easy to edit by hand, and for that you need separator characters that are easy for both text editors and humans to work with.

Personally, since I've discovered the field/group/record/file separator characters in ASCII, I've been using them to concat fields and rows on one-to-many SQL joins. They work great for that purpose since (at least on all the projects I've done this with so far) I can be certain that none of the values in the joined data will have those characters, so no further escaping is necessary. For example, in MySQL:

  SELECT
    i.item_id,
    GROUP_CONCAT(CONCAT_WS(0x1F, f.field_id, f.field_value) SEPARATOR 0x1E) AS field_values
  FROM items i 
  LEFT JOIN fields f ON f.item_id = i.item_id
  WHERE ...
Then split field_values with 0x1E to get each field ID and field value pair, and split each of those on 0x1F. Easy as pie.
How do you escape newline characters?