|
|
|
|
|
by captrb
2052 days ago
|
|
"Parquet files work well, but streaming is a tad more complex (you need to be able to seek to the end of the file to read the metadata before you can stream the contents)" I didn't realize that all the metadata in Parquet was stored at the end. That is indeed unfortunate for streaming use cases. Especially sad because columnar dictionary formats can offer great compaction for some data. I've been achieving 20x+ size redutions by converting from CSV to Parquet. |
|
https://arrow.apache.org/docs/python/ipc.html