Hacker News new | ask | show | jobs
by alecco 5833 days ago
It's very annoying to wait minutes to decompress big files. In particular installation times.
3 comments

Decompression is more often limited by disk I/O, in my experience, particularly when the source and destination are the same disk. I can often get large improvements in decompression and installation speed by putting the source file and / or temporary installation files on a different disk.
It's not always I/O speed. You can notice when installing CPU usage goes to 100% (or fans kicking in) for BWT/LZM* and not for the DEFLATE (unless you use -9 or something like that.) While you install something at least one of your cores is unavailable for anything else.

This affects energy consumption, too.

And think about both mobile and servers. Those systems are usually more sensible to high CPU load.

I have a draft blog post with analysis of different protocols with valgrind and other tools. But it is so much data to present and graph I never get around to finish it :(

If you look at some of the stats people are posting, it's the compression that takes the most time, not the decompression. gzip has fast compression and decompression, which is why it's used for things like compressing network streams (http,ssh,etc). But when you want to package up large files for distribution to a large audience, then it makes more sense throw some extra CPU time at the compression to get a smaller package (so long as the decompression time on the other end is reasonable).

  > If you look at some of the stats people are posting, it's the
  > compression that takes the most time, not the decompression
5 vs. 11 seconds. Worse than 2x slower decompression:

http://news.ycombinator.com/item?id=1458697

If you have to wait minutes to download the files it doesn't matter, but if you already have the file locally it is very annoying.

Also if this is used extensively on projects with a large server deployment this matters even more related to latency and energy consumption. That's why Google has their own compression algorithms derived from BMDiff and LZW (Zippy.) Think about it. Speed matters.

Are you willing to donate money to your favorite open source software so they can afford the bandwidth? If not, don't complain about having to spend a few more seconds decompressing the latest release (which you're getting for free).
As a programmer, I would rather work on a patent-unencumbered and open source compression algorithm solving exactly this problem. Perhaps investing months of my own unpaid time on it. HINT