| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mcdonje 1201 days ago

If it's tabular, self-describing formats have way too much overhead. I ran a query with a tabular result in the neighborhood of 100 columns by 215k rows, and exported it in multiple formats:

  - CSV: 166mb
  - JSON: 795mb

That said, not all data is tabular.

DuckDB already supports Parquet, which supports structs and is a very good format for storing data for reporting workloads. But JSON is a standard interchange format, so a lot of people are going to want to do something with JSON payloads they receive from API calls.

I could definitely imagine a workload where you receive JSON from an API call, load it into DuckDB or similar to help with ETL, then store results in Parquet.