Hacker News new | ask | show | jobs
by jrockway 1110 days ago
There are a variety of use cases that dictate which algorithm is going to perform best. For example, you might use Zstandard -19 if you are compressing something once and transferring it over a slow network to millions of people. You might use LZ4 if you are generating a unique large piece of data interactively for thousands of concurrent users, because it compresses faster than Zstandard. Basically, if you're constrained by network bandwidth, Zstandard; if you're constrained by CPU, LZ4.

There are then legacy formats that have stuck around long past their sell-by date, like gzip. People are used to using gzip, so you see it everywhere, but it's slower and compresses worse than Zstandard, so there is no reason why you'd ever use it except for compatibility with legacy systems. (bzip2, 7z, xz, snappy, etc. also live in this "no reason to use in 2023" space.)

Take a look at performance measurements here: https://jolynch.github.io/posts/use_fast_data_algorithms/. For example, gzip can get a compression ratio of 0.41 at 21MiB/s, while Zstandard does 0.38 (better) at 134MiB/s. (Meanwhile, lz4 produces outputs nearly twice as large as Zstandard, but compresses almost 3x faster and decompresses 2.5x faster.)

Lossy compression is even more complicated because the compression algorithms take advantage of "nobody will notice" in a way that's data dependent; so music, video, and photographs all have their own special algorithms.

1 comments

Your link seems to compare GNU gzip with zstd. When comparing file formats, I would compare the best software for that file format. igzip: https://github.com/intel/isa-l can decompress consistently faster than GNU gzip. Depending on the file, it decompresses 2-3x faster making it almost as fast as zstd decompression. I have less experience with compression benchmarks. A quick benchmark on Silesia shows igzip to be ~7x faster but it sacrifices 10% of compression ratio for that even on its highest compression setting. It seems to be optimized for speed.