Hacker News new | ask | show | jobs
by kbumsik 2360 days ago
> Recompressing all packages to zstd with our options yields a total ~0.8% increase in package size on all of our packages combined, but the decompression time for all packages saw a ~1300% speedup.

Impressive. As a AUR package maintainer I am also wondering how the compression speed is though.

3 comments

Compression speed is many, many, many times faster than xz, and (only) much faster than gzip. Really, only lz4 beats it.
After reading these comments, I can't help but wonder, what is the benefit of Zstd over lz4? Why didn't they switch to lz4 if it was the speed of the algorithm that they favored even with marginally worse compression ratios?
Where Zstd will reduce, say, 3x, Lz4 reduces only 2.5x. This doesn't seem very different until you look at it from the other end: my .zst file is 3.3 GB, but the .lz4 would have been 4 GB, which is 700 MB more.

Was a time when 700 MB mattered; it was as much as you could get onto a CD.

So, there is a place for each. I would set up the process to use Lz4 when testing, and Zstd for actual delivery to download archives.

In some circumstances, particularly when using a shared file server, Lz4 can be quite a lot faster than writing and reading data uncompressed.

Guessing that 0.8x size increase for 1300% speedup was worth the tradeoff but maybe ≥1.5 size increase or more was not (especially considering a 1300%->2000% increase is not going to be user visible for 99% of the packages).
It's not 0.8 times size increase, it's a 0.008 times size increase, since the unit is percent. The latter seems pretty marginal to me.
While the speedup is nice pacman still seems to operate sequentially, i.e. download, then decompress one by one. Decompressing while downloading or decompressing in parallel seems like a low-hanging fruit that hasn't been plucked yet that wouldn't have needed any changes to the compressor.
I might be wrong, but wouldn't it be prudent to first verify the checksum/signature of the downloaded archive before unpacking it? Even when just decompressing there's at least the danger of being zip-bombed (assuming a zip bomb can be constructed for any dictionary-based compression algorithm.)

FWIW I really applaud Arch here. Even if it's just a small step. Commercial operating systems should take notice. OS updates should really not take as long as they (mostly) do.

Even then it still could be pipelined. download, check signature, decompress while the next download is running. But yeah, pacman is plenty fast already.
Since most people are interested in the time taken to compress/decompress rather than the speed at which it happens, seems to me a better metric would be:

"... decompression time dropped to 14% of what it was..." (s/14/actual_value)