Hacker News new | ask | show | jobs
by jcranmer 637 days ago
Using gzip as a baseline, bzip2 provides only modest benefits: about a 25% improvement in compression ratio, with somewhat more expensive compression times (2-3×) and horrifically slow decompression times (>5×). xz offers a more compelling compression ratio (about 40-50% better), at the cost of extremely expensive compression time (like 20×), but comparable decompression time to gzip. zstd, the newest kid on the box, can achieve more slight benefits to compression ratio (~10%) at the same compression time/decompression time as gzip, but it's also tunable to give you as good results as xz (as slow as xz does).

What it comes down to is, if you care about compression time, gzip is the winner; if you care about compression ratio, then go with xz; if you care about tuning compression time/compression ratio, go with zstd. bzip2 just isn't compelling in either metric anymore.

3 comments

> at the same compression time/decompression time as gzip

In my experience zstd is considerably faster than gzip for compression and decompression, especially considering zstd can utilize all cores.

gzip is inferior to zstd in practically every way, no contest.

Practically, compatibility matters too, and it's hard to beat gzip there.
The benefit from zstd is however so great that I even copied the zstd binary to some server I was managing but couldn't easily compile it from scratch. Seriously, bundling zstd binary is that worthy by now.
If you can control both sides, definitely go for it!

But in many cases, we unfortunately can't (gzip/Deflate is baked into tons of non-updateable hardware devices for example).

> if you care about compression time, gzip is the winner

Not at all. Lots of benchmarks show zstd being almost one order of magnitude faster, before even touching the tuning.

Adding to this: I like looking at graphs like https://calendar.perfplanet.com/images/2021/leon/image1.png . In this particular example, the "lzma" (ie xz) line crosses the zstd line, meaning that xz will be compress faster for some target ratios, zstd for others. Meanwhile zlib is completely dominated by zstd.

Different machines and different content will change the results, as will the optimization work that's gone into these libraries since someone made that chart in 2021.