|
|
|
|
|
by apendleton
1867 days ago
|
|
I agree with the general premise that there's no reason to ever use gzip anymore (unless you're in an environment where you can't install stuff), but interestingly my experience with the tradeoffs is apparently not the same as yours. I tend to find that zstd and gzip give pretty similar compression ratios for the things I tend to work with, but that zstd is way faster, and that xz offers better compression ratios than either, but is slow. So like, my personal decision matrix is "if I really care about compression, use xz; if I want pretty good compression and great speed -- that is, if before I would have used gzip -- use zstd; and if I really want the fastest possible speed and can give up some compression, use lz4." |
|
1. There are two speeds: compression and decompression; lz4 only beats zstd when decompressing ("zstd -1" will compress faster than lz4, and you can crank that up several levels and still beat lz4_hc on cmopression). bzip2 is actually fairly competitive at compression for the ratios it achieves but loses badly at decompression.
2. "zstd --ultra -22" is nearly identical compression to xz on a corpus I just tested (an old gentoo distfiles snapshot) while decompressing much faster (I didn't compare compression speeds because the files were already xz compressed).
[edit]
Arch linux (which likely tested a larger corpus than I) reported a 0.8% regression in size when switching from xz to zstd using a compression level 20. This supports your assertion that xz will beat zstd in compression ratio.
[edit2]
bzip2 accidentally[1] outperforms all other compression algorithms I've tried handily on large files that are all zero; for example 1GB of zeroes with "dd if=/dev/zero bs=$((1024*1024)) count=1024 |bzip2 -9 > foo.bz2" generates a file that is only 785 bytes. zstd is 33k and xz is 153k. Of course my non-codegolfed script for generating 1GB of zeros is only 38 bytes...
1: There was a bug in the original BWT implementation that had degenerate performance on long strings of identical bytes, so bzip2 includes an RLE pass before the BWT.