| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hk1337 639 days ago
	Why do people use gzip more often than bzip? There must be some benefit but I don’t really see it, you can split and join two bzipped files (presumably CSV so you can see the extra rows). Bzip seems to compress better than gzip too.

6 comments

jcranmer 639 days ago

Using gzip as a baseline, bzip2 provides only modest benefits: about a 25% improvement in compression ratio, with somewhat more expensive compression times (2-3×) and horrifically slow decompression times (>5×). xz offers a more compelling compression ratio (about 40-50% better), at the cost of extremely expensive compression time (like 20×), but comparable decompression time to gzip. zstd, the newest kid on the box, can achieve more slight benefits to compression ratio (~10%) at the same compression time/decompression time as gzip, but it's also tunable to give you as good results as xz (as slow as xz does).

What it comes down to is, if you care about compression time, gzip is the winner; if you care about compression ratio, then go with xz; if you care about tuning compression time/compression ratio, go with zstd. bzip2 just isn't compelling in either metric anymore.

umvi 639 days ago

> at the same compression time/decompression time as gzip

In my experience zstd is considerably faster than gzip for compression and decompression, especially considering zstd can utilize all cores.

gzip is inferior to zstd in practically every way, no contest.

lxgr 638 days ago

Practically, compatibility matters too, and it's hard to beat gzip there.

lifthrasiir 638 days ago

The benefit from zstd is however so great that I even copied the zstd binary to some server I was managing but couldn't easily compile it from scratch. Seriously, bundling zstd binary is that worthy by now.

lxgr 638 days ago

If you can control both sides, definitely go for it!

But in many cases, we unfortunately can't (gzip/Deflate is baked into tons of non-updateable hardware devices for example).

Too 638 days ago

> if you care about compression time, gzip is the winner

Not at all. Lots of benchmarks show zstd being almost one order of magnitude faster, before even touching the tuning.

andrewf 638 days ago

Adding to this: I like looking at graphs like https://calendar.perfplanet.com/images/2021/leon/image1.png . In this particular example, the "lzma" (ie xz) line crosses the zstd line, meaning that xz will be compress faster for some target ratios, zstd for others. Meanwhile zlib is completely dominated by zstd.

Different machines and different content will change the results, as will the optimization work that's gone into these libraries since someone made that chart in 2021.

rwmj 639 days ago

Faster than most alternatives, good enough, but most importantly very widely available. Zstd is better on most axes (than bzip as well), except you can't be sure it's always there on every machine and in every language and runtime. zlib/gzip is ubiquitous.

We use xz/lzma when we need a compressed format that you can seek through the compressed data.

duskwuff 639 days ago

bzip2 is substantially slower to compress and decompress, and uses more memory.

It does achieve higher compression ratios on many inputs than gzip, but xz and zstd are even better, and run faster.

masklinn 639 days ago

TBF zstd runs most of the gamut, so depending on your settings you can have it run very fast at a somewhat limited level of compression or much lower at a very high compression.

Bzip is pretty completely obsolete though. Especially because of how ungodly slow it is to decompress.

duskwuff 639 days ago

> TBF zstd runs most of the gamut

Yep. But bzip2 is much less flexible; reducing its block size from the default of 900 kB just reduces its compression ratio. It doesn't make it substantially faster; the algorithm it uses is always slow (both to compress and decompress). There's no reason to use it when zstd is available.

masklinn 639 days ago

Oh I completely agree, as I said bzip2 is obsolete as far as I’m concerned.

I was mostly saying zstd is not just comparable to xz (as a slow but high-compression ratio format), it’s also more than competitive with gzip, if it’s available the default configuration (level 3) will very likely compress faster and use less CPU and yield a smaller file size than gzip, though I’m pretty sure it uses more memory to do that (because of the larger window if nothing else).

qhwudbebd 638 days ago

I agree about the practical utility of bzip2. It's quite an interesting historical artefact, though, as it's the only one of these compression schemes that isn't dictionary-based. The Burrows-Wheeler transform is great fun to play with.

zie 639 days ago

Muscle memory. We've been doing gzip for decades and we are too lazy to remember the zstd commands to tar, assuming the installed version of tar has been updated.

gloflo 639 days ago

... --auto-compress ... foo.tar.zstd

zie 639 days ago

That's cool! Is that a GNU tar only thing? Based on it being a longopt, I'm guessing a GNU tar only thing. That's the problem with these things, it takes a while to get pushed to all the installed copies of tar running around. Perhaps it's time to check:

  * MacOS Sonoma(14.6) has tar --auto-compress and --zstd
  * OpenBSD tar does not appear to have it: https://man.openbsd.org/tar
  * FreeBSD does: https://man.freebsd.org/cgi/man.cgi?query=tar

Not quite fully baked yet.

qhwudbebd 638 days ago

Both libarchive ("bsdtar") and GNU tar have -a, which I guess are the only two upstream tar implementations that are still relevant? You're right, it can take a while for these things to propagate downstream though.

zie 638 days ago

Some of us use OpenBSD: https://man.openbsd.org/tar

qhwudbebd 635 days ago

Ooh interesting. I'd assumed (incorrectly) that OpenBSD tar was just libarchive like FreeBSD and NetBSD. I prefer BSD libarchive tar to GNU tar as /bin/tar on my Linux machines too for what it's worth.

oneshtein 639 days ago

gzip is fast (pigz is even faster), supported everywhere (even in DOS), uses low amounts of memory, and compresses well enough for practical needs.

bzip2 is too slow.

xz is too complex (see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1068024 ), designed to compress .exe files.

lzip is good, but less popular.

zstd is good and fast, but less popular.

Spooky23 639 days ago

Another factor is that gzip is over 30 years old and ubiquitous in many contexts.

Zstd is awesome, but has only been around for a decade, but seems to be growing.

sgarland 638 days ago

Parallel compression (pigz [0]) and decompression (rapidgzip [1]), for one. When you're dealing with multi-TB files, this is a big deal.

[0]: https://github.com/madler/pigz

[1]: https://github.com/mxmlnkn/rapidgzip