Hacker News new | ask | show | jobs
by swombat 5833 days ago
So bzip2 and 7-zip are way, way slower than gzip, then?

Bandwidth is cheap. Stick to gzip.

4 comments

So bzip2 and 7-zip are way, way slower than gzip, then?

Bandwidth is cheap. Stick to gzip.

It's not as simple as that. Which one is better depends on the use case. If you're sending a one-off file to somebody, sure, gzip is better. But if you want to distribute a file to a large number of people (like Linux distributions do with their packages), the extra CPU time is insignificant compared to the bandwidth saved over the course of thousands of downloads.

It's very annoying to wait minutes to decompress big files. In particular installation times.
Decompression is more often limited by disk I/O, in my experience, particularly when the source and destination are the same disk. I can often get large improvements in decompression and installation speed by putting the source file and / or temporary installation files on a different disk.
It's not always I/O speed. You can notice when installing CPU usage goes to 100% (or fans kicking in) for BWT/LZM* and not for the DEFLATE (unless you use -9 or something like that.) While you install something at least one of your cores is unavailable for anything else.

This affects energy consumption, too.

And think about both mobile and servers. Those systems are usually more sensible to high CPU load.

I have a draft blog post with analysis of different protocols with valgrind and other tools. But it is so much data to present and graph I never get around to finish it :(

If you look at some of the stats people are posting, it's the compression that takes the most time, not the decompression. gzip has fast compression and decompression, which is why it's used for things like compressing network streams (http,ssh,etc). But when you want to package up large files for distribution to a large audience, then it makes more sense throw some extra CPU time at the compression to get a smaller package (so long as the decompression time on the other end is reasonable).

  > If you look at some of the stats people are posting, it's the
  > compression that takes the most time, not the decompression
5 vs. 11 seconds. Worse than 2x slower decompression:

http://news.ycombinator.com/item?id=1458697

If you have to wait minutes to download the files it doesn't matter, but if you already have the file locally it is very annoying.

Also if this is used extensively on projects with a large server deployment this matters even more related to latency and energy consumption. That's why Google has their own compression algorithms derived from BMDiff and LZW (Zippy.) Think about it. Speed matters.

Are you willing to donate money to your favorite open source software so they can afford the bandwidth? If not, don't complain about having to spend a few more seconds decompressing the latest release (which you're getting for free).
As a programmer, I would rather work on a patent-unencumbered and open source compression algorithm solving exactly this problem. Perhaps investing months of my own unpaid time on it. HINT
... such as BitTorrent distribution.
mileage may vary; If the time it takes to compress is costs much less than the time it takes to transfer over the network, you might want not want to use gzip. For example, you're transferring a large file (1GB? 1TB?) to a remote person to deal with, is it cheaper to gzip (lower compression rate), take longer for the network transfer (most likely slowest step), and have the other person unzip, or to use a better compressor, and have the file transferred over quicker?
The question in my mind is if 7-zip has a good multicore implementation for compression/decompression. I recall that the multicore implementation by the original gzip author increased the speed on a 4-core machine by something like a x3.75 boost.
Given the increasing CPU/network gap, it's only a matter of time before the bandwidth (and thus time) saved by XZ more than compensates for the slower compression.