| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sidewndr46 345 days ago
	What are you compressing with zstd? I had to do this recently and the "xz" utility still blows it away in terms of compression ratio. In terms of memory and CPU usage, zstd wins by a large margin. But in my case I only really cared about compression ratio

4 comments

vlovich123 345 days ago

people tend to care about decompression speed - xz can be quite slow decompressing super compressed files whereas zstd decompression speed is largely independent of that.

People also tend to care about how much time they spend on compression for each incremental % of compression performance and zstd tends to be a Pareto frontier for that (at least for open source algorithms)

bracketfocus 345 days ago

This makes sense. A lot of end-users have internet speeds that can outpace the decompression speeds of heavily compressed files. Seems like there would be an irrational psychological aspect to it as well.

Unfortunately for the hoster, they either have to eat the cost of the added bandwidth from a larger file or have people complain about slow decompression.

vlovich123 345 days ago

Well the difference is quite a bit more manageable in practice since you’re talking about single digit space difference vs a 2-100x performance in decompression.

sidewndr46 345 days ago

I definitely agree, I basically have unlimited time and unlimited CPU for decompressing. Available memory is huge too. The gains from xz were significant enough that I went with it.

landl0rd 345 days ago

I usually see zstd on max settings outperform xz on speed and very slightly on compression (though that's a tiny difference).

Szpadel 345 days ago

in my experience using zstd --long --ultra -22 gives marginally better compression ratio than xz -9 while being significantly faster

soruly 345 days ago

I think it depends on what you're compressing. I experimented with my data full of hex text xml files. xz -6 is both faster and smaller than zstd -19 by about 10%. For my data, xz -2 and zstd -17 achieve the same compressed size but xz -2 is 3 times faster than zstd -17. I still use xz for archive because I rarely needs to decompress them.

Szpadel 344 days ago

Try combining it with --long

My use cases are usually source code, SQL dumps and log files.

Sometimes xz gave marginally better results, but difference was well below 1%

soruly 344 days ago

thanks for the tips. As my data has very low entropy, both can compress down to 3-4% of original size, but xz is a lot faster in compression.

raw size: 9612344 B

zstd --ultra -22 --long=31 => 376181 B (3.91% original, 4.088s compress, 0.013s decompress)

xz -z -9 xml => 353700 B (3.68% original, 0.729s compress, 0.032s decompress)

zstd -17 --long=31 could match the compression time of xz, but the size is bigger (405602 B, 4.22% original)

If you compare only the compressed size (not to the original size), .zst would be about 6-15% larger than .xz

xxs 345 days ago

do you have examples where xz 'blows it away', not just zstd -3?

sidewndr46 345 days ago

Here are some examples of what I was doing in one case

https://www.hydrogen18.com/blog/apk-the-strangest-format.htm...

I was running "zstd --ultra --threads=0" which I assumed was asking it for the absolute maximum

sltkr 344 days ago

I think your mistake was to use --ultra without a compression level.

I redid your experiments with rust-wasm-1.83.0-r0.apk:

                            size       perc   c.time  d.time
    uncompressed:      290072064          -        -
    gzipped original:  105255109     36.29%        -  
    bzip2 -9:          107099379     36.92%    21.1s  11.0s
    bzip3 -b511:        73539847     25.35%    28.9s  32.0s
    xz --extreme -9:    71010672     24.48%   142.0s   3.1s
    lzip -9:            70964413     24.46%   173.5s   5.3s
    zstd --ultra -22:   48288499     16.64%   155.6s   0.4s

It's pretty clear zstd blows everything else out of the water by a huge margin. And even though compressing with zstd is slightly slower than xz in this case (by less than 10%), decompression is nearly 8x as fast, and you can probably tweak the compression level to make zstd be both faster and better than xz.

ars 344 days ago

That was an impressive result, so I tried it on a huge email inbox.

    uncompressed:    1512662084
    xz --extreme -9:  508431572  12:47
    zstd --ultra -21: 508432560  12:44

(-22 ran out of memory.) So at least by me zstd was identical to xz almost to the byte and the second.

sltkr 344 days ago

It does really vary based on the data set.

If the email data is mostly text with markup (like HTML/XML), you might want to try bzip3 too.

It's also possible that a large part of your email is actually already-compressed binary data (like PDFs and images) possibly encoded in base-64. In that case it's likely that all tools are pretty good at compressing the text and headers, but can do little to compress the attachments, which would explain why the results you get are so close.

ars 344 days ago

    bzip3 -b511: 580771424  8:51

I suspect your theory about compressed attachments is correct, although bzip3 isn't doing very well compared to the rest.

ars 344 days ago

I got -22 to run:

    zstd --ultra -22: 494517545 14:00

Pretty minor difference.

sidewndr46 344 days ago

I guess I misunderstood the man page for that option then.

xxs 344 days ago

yup, you should have tried just different -NN, and notice. I had a talk on zstd couple of years back, and one of the points was that it was better than xz across the board.