Hacker News new | ask | show | jobs
by maxcoder4 805 days ago
I don't understand the question about empty values. A row with there empty values is just:

,,

And yeah, dealing with separators is annoying, but pretty much every reader supports quoted values and escapes (not every writer cares up add them, though).

In practice I use csv to process large amount of tabular data (like logs, events, etc) where I care about greppability and performance more than about potential for 0.001% of corrupted data. YMMV of course, if getting it 100% right is important use something else (JSON is not without sins too, consider that JSON numbers are often parsed as floats).

2 comments

Is it ''?

Or is it ,,,,,,,

Or is it tabtabtabtab

Or perhaps it's "NULL,NULL,NULL"

My part of my work is data ingestion and I've seen all these (and more) as answers to the "empty values" question.

I'm not saying that other formats aren't without their problems, they certainly are. However, CSV doesn't just have those problems, it has multiple other problems on top of them.

It's a basic idea with really obvious edge cases addressed in multiple ways depending on who is producing these documents.

CSV is extremely underspecified, but that doesn't mean it deserves the blame for software that fails to implement even the one thing it inherently specifies. A sequence of tabs is one value in a CSV. A sequence of semicolons is one value in a CSV. Any software that thinks otherwise is buggy, and unfortunately that includes some extremely popular software that is supposed to be good at tables (Excel).
I would guess that their point is that CSV does have any official null-vs-empty-string differentiation mechanism.