| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aidenn0 1866 days ago

A few comments:

1. There are two speeds: compression and decompression; lz4 only beats zstd when decompressing ("zstd -1" will compress faster than lz4, and you can crank that up several levels and still beat lz4_hc on cmopression). bzip2 is actually fairly competitive at compression for the ratios it achieves but loses badly at decompression.

2. "zstd --ultra -22" is nearly identical compression to xz on a corpus I just tested (an old gentoo distfiles snapshot) while decompressing much faster (I didn't compare compression speeds because the files were already xz compressed).

[edit]

Arch linux (which likely tested a larger corpus than I) reported a 0.8% regression in size when switching from xz to zstd using a compression level 20. This supports your assertion that xz will beat zstd in compression ratio.

[edit2]

bzip2 accidentally[1] outperforms all other compression algorithms I've tried handily on large files that are all zero; for example 1GB of zeroes with "dd if=/dev/zero bs=$((1024*1024)) count=1024 |bzip2 -9 > foo.bz2" generates a file that is only 785 bytes. zstd is 33k and xz is 153k. Of course my non-codegolfed script for generating 1GB of zeros is only 38 bytes...

1: There was a bug in the original BWT implementation that had degenerate performance on long strings of identical bytes, so bzip2 includes an RLE pass before the BWT.