Hacker News new | ask | show | jobs
by mappu 2383 days ago
Are all compressors simply being run with their default arguments? There's a lot of scope for speed/filesize tradeoffs within a single compressor.

EDIT: You are missing `csv+zstd` ? It should obsolete `csv+gzip` at all speeds and compression levels.

There is a pareto-optimality frontier here - I ran my testing back in 2016 https://code.ivysaur.me/compression-performance-test/ but the numbers are now a little bit obsolete (e.g. zstd and brotli have both seen a lot of improvements).

2 comments

Yea, I thought parameters per compression algorithm should indeed be added in a next version :) more compute but definitely an improvement. I think pandas doesn't offer zstd as option with csv, but I'll check once more.

EDIT: indeed, it's missing in `to_csv` - seems like an oversight.

+1 - and if you could use zstd with custom dictionary, then you can achieve even better compression ratios
Yea zstd is really amazing... if I would choose a single one all the time it'd be zstd for sure.