|
|
|
|
|
by jitl
798 days ago
|
|
I think “human readability” isn’t a great feature for a columnar data format, because once you get data on a scale where the column oriented layout makes sense, you’re way past the scale where a human would be want to read over the stored data anyways. Like, no human is going to read 50k rows, much less 10m rows. I guess it’s nice you can spot check the rows using only zip & head -n 10 and paste, but I don’t think that nice-ness is a good reason to pick a format that forbids common ASCII characters and doesn’t have widespread support. It’s guess there’s a sort of perma-computing angle here, this format is simple enough that you could pack a lot of almanac data into it, and given a working zlib get it back out with very limited dependencies. But given the petabytes of parquet files out there, I feel like the format is here to stay, much like sqlite is here to stay. EDIT: there is a great handy CLI tool for doing SQL on parquet, csv, sqlite3, and other tabular data formats called duckdb. Handy for wrangling and analyzing tabular data from 100 to 10m rows and up. |
|
Human-readable comes handy here.