Hacker News new | ask | show | jobs
by 01HNNWZ0MV43FF 683 days ago
Because json5L hasn't caught on yet and everything else has obvious flaws
2 comments

I routinely interface with 1GB+ csvs. The size explosion for json would be huge. Disk IO aside, I assume a json parser is going to be slower to parse than csv.
How would JSON cause a size explosion?

Nothing prevents you using ndjson where you define a header and then have an array per line.

Nobody does this currently. You have now created another bespoke format. If I am going to need a custom parser/writer, I might as well lean on a binary format that has far stronger properties than a text based one.
JSONL is pretty common format. It makes sense for logs and anything else written incrementally.

JSON parsers are super common. They are simpler and faster than CSV because it is more regular. JSONL is simple to implement cause write by record and read by line.

The only difference with CSV are bracket characters around line and every string has quotes. The benefit is clear escaping rules including for newlines.

JSONL is standard. Upthread said to write the header row and then make subsequent rows arrays. Of which I am not aware of anything that does this currently.

My objection to JSONL was about the increase in file size owing to repeating the keys.

JSON can write arrays in addition to hashes. JSON arrays are nearly identical to CSV. The only difference is brackets around li;es. There is no extra space wasted for keys.
Why do you use a text-based format at all at that size?
You get what you get. Presumably when it started, they were a more modest size.
Eh, I'm skeptical of this statement.

CVS is explicitly about tabular data. JSON (including JSON5) is much more flexible. Flexibility can be great but also can be annoying. If you want tabular data, then a system that enables nesting isn't great.

I love jsonlines but csvs are way more compact, since you don't have to repeat the column name for every line of data
I think the fact that a human can mostly just read csvs is an important part of their adoption, too.
You would write JSON arrays without names for tabular data. I don’t know if there is a standard way to do the header, but array of names would work. Or JSON Schema record.
Rather than highlighting flexibility as the differentiator, I would say: CSV is for dense data, JSON is for sparse data. They are flexible in different ways. For example, CSV is very flexible when renaming a column title.