| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by roncohen 2011 days ago
	another contender is zstd: https://github.com/facebook/zstd. It typically offers better compression ratios than LZ4 at a slight (depending on your data) cost in speed. Additionally it offers a training mode to tune the algorithm to increase compression ratio on specific types of data, particularly useful for compression of small pieces of data.

5 comments

moonchild 2011 days ago

> It typically offers better compression ratios than LZ4 at a slight (depending on your data) cost in speed

Per the table at [0], zstd provides only a slight improvement in compression ratio, and in exchange is about half the speed of lz4.

They both have their place.

0. https://facebook.github.io/zstd/

link

hyperpape 2011 days ago

That table shows zstd comparing poorly at its faster settings, but at slower settings, it offers a significantly better compression ratio, albeit 3x slower decompression.

link

baybal2 2011 days ago

LZ4 has branchless decompression, and lower cache footprint, thus it can work on low end, and non-desktop CPUs equally well.

zstd, brotly, snappy were seemingly all made with high end x86 capabilities in mind.

link

_vvhw 2011 days ago

I also appreciate LZ4's simplicity and tiny code footprint.

zstd is brilliant as well, but in terms of code base it's a whole other beast.

link

makapuf 2011 days ago

Yes decompression on baremetal cortex m4 is a mere hundreds of bytes, you can decompress it from flash directly to its output buffer.

link

Teknoman117 2011 days ago

I've used it in bootloaders that have slow transfer mechanisms (uart, i2c) to get whatever speedup I can for a few hundred bytes of binary.

link

wolf550e 2011 days ago

Google snappy is same class as lzo and lz4, not same class as brotli and zstd.

link

ignoramous 2011 days ago

Also see Daniel Reiter Horn's DivANS built at Dropbox: https://dropbox.tech/infrastructure/building-better-compress...

link

unsigner 2011 days ago

Zstd is very different - it includes an entropy coder. LZ4 only finds repeated matches, but then doesn't encode them very efficiently.

To put it simplistically, if you have a file which is a (good) random mix of an equal number A and B characters, LZ4 won't be able to compress it significantly, while Zstd will compress it 8:1 converging to an encoding where a '1' bit is A, and a '0' bit is B.

link

YetAnotherNick 2011 days ago

> To put it simplistically, if you have a file which is a (good) random mix of an equal number A and B characters, LZ4 won't be able to compress it significantly

I checked it. LZ4 is still reducing the size to half, no idea why half. So for 10 MB file it compresses to 5 MB.

Edit: checked with highest compression and it compresses 1MB file to 185KB. So what the parent wrote is false.

link

not2b 2011 days ago

Yes, if I take the 8 combinations aaa, aab, aba etc and assign each of them a 9 bit codeword I replace each 24 bit sequence with a 9 bit sequence. So arithmetic coders have no problem with cases like this.

link

unsigner 2006 days ago

but LZ4 doesn't have a arithmetic coder, or any other statistical encoding - it's just matches and literals. Puzzling...

link

julian37 2011 days ago

Yep, Zstd is the spiritual successor to LZ4 and written by the same person (Yann Collet) after they got hired by Facebook.

link

smueller1234 2011 days ago

Actually, I seem to recall that he was working on it before getting hired by Facebook (unless there was a massive delay in the hiring to become known). I was following his excellent blog posts on the matter at the time.

link

tmd83 2011 days ago

Yes it was a fully working things before facebook. There has been a lot of improvement in both the core and cli. But the core innovations of zstd was well established before facebook. I was probably following his blogs (even though I wasn't a compression expert) for months before I saw the post about his joining facebook.

link

thechao 2011 days ago

Yann wrote LZ4 and Zstd well before joining FB. I have to applaud FB for supporting Yann's work, though.

link

zrav 2011 days ago

I've spent an afternoon testing zstd's custom dictionaries. It really only provides benefits on small data blocks. According to my tests, the largest blocks at which custom dictionaries could still provide a benefit is 8K, above that the compression ratio advantage compared to the default is definitely gone.

link

StreamBright 2011 days ago

> Additionally it offers a training mode to tune the algorithm to increase compression ratio on specific types of data

Yes, however there is usually no facility to train your compression algo with most tools using ZSTD.

link

pmarreck 2011 days ago

There should be a way to pool standard dictionaries somewhere, such as a "standard english text corpus data" dictionary, that you can then download on demand for encoding, say, BLOB text fields in a database with little to no overhead.

The way this would probably work without this facility though, say, in a database, is that the dictionary is maintained internally and constructed on the fly from the field data and not exposed to users. Although, I don't know if you'd have to keep every version of the dictionary in order to successfully decompress old data? If so then perhaps this is a niche feature

link

felixhandte 2011 days ago

W.r.t. standard dictionaries, it's something we're interested in, but the fundamental reality of dictionaries is that their effectiveness is strongly tied to their specificity. Put another way, a universal dictionary is a self-contradiction.

And yes, totally, I know at least RocksDB supports exactly that behavior [0].

[0] https://github.com/facebook/rocksdb/blob/12f11373554af219c51...

link