Hacker News new | ask | show | jobs
by WinonaRyder 2360 days ago
Zstandard is awesome!

Earlier last year I was doing some research that involved repeatedly grepping through over a terabyte of data, most of which were tiny text files that I had to un-zip/7zip/rar/tar and it was painful (maybe I needed a better laptop).

With Zstd I was able to re-compress the whole thing down to a few hundred gigs and use ripgrep which solved the problem beautifully.

Out of curiosity I tested compression with (single-threaded) lz4 and found that multi-threaded zstd was pretty close. It was an unscientific and maybe unfair test but I found it amazing that I could get lz4-ish compression speeds at the cost of more CPU but with much better compression ratios.

EDIT: Btw, I use arch :) - yes, on servers too.

1 comments

Here's a compression benchmark.

http://pages.di.unipi.it/farruggia/dcb/

Looks like Snappy beats both LZ4 and Zstd in compression speed and compression ratio, by a huge margin.

LZ4 is a ahead of Snappy in the decompression speed.

Similar to how code is read more times than it is written, files are decompressed more times than compressed.

I have not researched this opinion much

I find these numbers for Snappy entirely implausible.

The numbers I know about are wrong: zstd always beats gzip for compression ratio.

I will need to do my own testing.

I have tested snzip 1.0.4.

It compresses about as well as lz4, but more slowly. It also decompresses more slowly.

It is faster than zstd -1, but compresses less well.

It is possible that it does better with certain kinds of data, but 12x remains implausible.

Apparently the current file format has suffix ".sz".