| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pbreit 1201 days ago
	Besides XML, JSON is about the worst way to format tabular data, right?

4 comments

mcdonje 1201 days ago

If it's tabular, self-describing formats have way too much overhead. I ran a query with a tabular result in the neighborhood of 100 columns by 215k rows, and exported it in multiple formats:

  - CSV: 166mb
  - JSON: 795mb

That said, not all data is tabular.

DuckDB already supports Parquet, which supports structs and is a very good format for storing data for reporting workloads. But JSON is a standard interchange format, so a lot of people are going to want to do something with JSON payloads they receive from API calls.

I could definitely imagine a workload where you receive JSON from an API call, load it into DuckDB or similar to help with ETL, then store results in Parquet.

link

chundicus 1201 days ago

For me it depends a lot on the context. JSON is often very human readable (as long as it's not too deeply nested), fairly well defined (compared to CSVs), and most languages and software have easy out of the box support for parsing and manipulating it.

If I were building a system that had to deal with large amounts of tabular data that isn't directly consumed by humans, JSON wouldn't be my first choice nor my last.

link

pbreit 1201 days ago

It's interesting that JSON is still the format of choice for transmitting tabular data to SPAs and mobile apps. Granted, it's likely compressed. But still seems something more efficient like CSV would be better.

link

lnkuiper 1200 days ago

This is very true. DuckDB does not support JSON because it’s a good tabular format, but because JSON is ubiquitous, and there are many use cases where querying JSON dumps for analytics is useful.

link

AtNightWeCode 1200 days ago

My love for line-based data formats have increased over time. CSV, JSON-string per line and so on. You can always append to the data and you can deserialize line-by-line.

link