| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by valarauca1 3360 days ago

ZSTD.

It is superior to Brotli in most categories (decompression, compression ratios, and compression speeds). The real issue with Brotli is the second order context modeling (compression level >8). Causes you to lose ~50% compression speed for less then a ~1% gain in ratios [1].

I've spoken to the author about this on twitter. They're planning on expanding Brotli dictionary features and context modeling in future versions.

Overall it isn't a bad algorithm. Brotli and ZSTD are head and shoulders above LZMA/LZMA2/XZ. Pulling off comparable compression ratios in half to a quarter of the time [1]. They make GZip and Bzip2 look outdated (which frankly its about time).

ZSTD really just needs a way to package dictionaries WITH archives.

[1] These are just based on personal benchmarks while building a tar clone that supports zstd/brotli files https://github.com/valarauca/car

2 comments

terrelln 3360 days ago

What use case do you have in mind for packaging dictionaries with archives? There is an ongoing discussion about a jump table format that could contain dictionary locations [1].

[1] https://github.com/facebook/zstd/issues/395

link

valarauca1 3360 days ago

For large files >1GiB a library + archive is often smaller then the archive on its own.

link

terrelln 3360 days ago

How are you compressing the data?

I would expect a dictionary to be useful if the data is broken into chunks, and each chunk is compressed individually.

If the data is compressed as one frame, I would be very interested in an example where the dictionary helps.

link

JyrkiAlakuijala 3357 days ago

In my benchmarks brotli compresses more densely, compresses typically faster to a given density, but decompresses slower.

I benchmark with internet-like loads, not with 50-1000 MB compression research corpora.

link