Hacker News new | ask | show | jobs
by O_H_E 1286 days ago
I'd imagine distributing tarballs is also an important use case.
1 comments

Actually zstd makes that worse too, somewhat paradoxically. At least in this case, because Zig uses xz for their tarballs. (If they used gzip, it would be the other way around.)

The reason is that compression algorithms usually can't make further reductions when re-compressing already-compressed files. And xz has a higher compression ratio than zstd, so when you stick zig1.wasm.zst into a tar.xz file, xz is deprived of the opportunity to work its more powerful magic.

As a test, I got zig-0.11.0-dev.638+5c67f9ce7.tar.xz from https://ziglang.org/download/ , extracted it, and rebuilt the tar.xz myself. Then I replaced stage1/zig1.wasm.zst with stage/zig1.wasm and rebuilt the tar.xz again.

Results:

    $ du -sk *tar*
    168136  zig.new.tar
    14500   zig.new.tar.xz
    166416  zig.orig.tar
    14568   zig.orig.tar.xz
So, zig.orig.tar is the uncompressed tarball that contains zig1.wasm.zst, and it is indeed smaller than zig.new.tar. But the .tar.xz files are the other way around.

Not using zstd saves 68K.

=-=-=

Also, in the process, I accidentally discovered something else that makes a bigger difference.

Since I knew the order of files within a tar archive can affect the compression ratio (due to data locality), while doing my test, I used "tar tf" to list my tar file's contents and compare it with what I downloaded. It didn't match, so I knew I wasn't doing an apples to apples comparison.

So I added "--sort=name" to my tar commands. And both of my tar files ended up smaller than the one I downloaded:

    $ du -sk zig-0.11.0-dev.638+5c67f9ce7.tar.xz 
    15152   zig-0.11.0-dev.638+5c67f9ce7.tar.xz
Just adding the "--sort=name" option to tar saves 584K! That's around 4% of the entire tar file. Locality matters more than I thought.