Hacker News new | ask | show | jobs
by alecco 5213 days ago
It's good people get interested in the subject. But this is very odd and has some errors. For example xz requires a lot more memory resources than bzip2 (see benchmarks below, Mem column).

http://mattmahoney.net/dc/text.html

http://mattmahoney.net/dc/uiq/

Matt Mahoney mantains the best benchmarks on text and generic compression. Some of the best on the field (like Matt) usually hang out at encode.ru.

2 comments

By resources I meant wall-time, mostly - what I am optimizing for (I move >200GB zfs snapshots around a lot..). I did not pay attention to memory, specifically - in that light I updated the side note.

What else did you find odd/wrong?

You don't mention ratios. Better compression becomes asymptotically/exponentially harder when it gets close to the optimal code size.

Thanks for the shout out. BTW Matt's benchmarks are for all major compression engines, not only bzip2.

That benchmark runs xz with the -9e flags, which turn on its slowest and most memory-intensive mode. If you pass it -0 it only needs 3 MB to compress and 1 MB to decompress.

I usually use -0 with xz because it is extremely fast and memory-efficient yet still compresses better than gzip or bzip2. You can also use -0e for a slower, better compression that still requires only 1 MB to decompress. This way the decompressor can run entirely within the CPU's cache.

Yes, it's in the benchmarks. But comparing against bzip2 is not very meaningful. It's more relevant to compare with compressors in the same efficiency rate. For those flags it compresses to 26MB, in that range there are many equivalent ROLZ/LZP engines with similar numbers. For example csc32 is a bit faster but uses more memory.

http://mattmahoney.net/dc/text.html#2118

http://mattmahoney.net/dc/text.html#2300

Proper analysis would need benchmarking with different data and different flags for all compressors.