Hacker News new | ask | show | jobs
by jononor 886 days ago
I find that Parquet works great for time series data in general. We use it a bunch for datasets of some millions of rows for each of 10-1000 devices, with typically 10 ish columns per device. What I am missing is good support for delta encdoing, which often leads to the best compression for time series data. The format specifies delta encoding for all integer types however I find it to be poorly supported in most Python-friendly libraries (fastpaquet/polars/duckdb).