Y
Hacker News
new
|
ask
|
show
|
jobs
by
lsorber
2145 days ago
Have you benchmarked this against pickling those data files? In our experience, parquet's overhead isn't worth it for smaller data files.
4 comments
alfalfasprout
2145 days ago
I just did some benchmarks and it's pretty similar for small files. The difference would only be noticeable if you're serializing a ton of small files.
link
lsorber
2141 days ago
Huh, makes a pretty big difference for us. We were using pandas' built-in to_parquet though, which seems to suffer from some overhead.
link
EdwardDiego
2145 days ago
I'm not surprised, Parquet's columnar encoding and compression won't really kick in significantly for smaller files.
link
kylebarron
2145 days ago
But with pickling you can only read the data in Python.
link
cbsmith
2145 days ago
If pickling is what is working best for you, it can't be much data.
link