Hacker News new | ask | show | jobs
by unsigner 1964 days ago
Zstd is very different - it includes an entropy coder. LZ4 only finds repeated matches, but then doesn't encode them very efficiently.

To put it simplistically, if you have a file which is a (good) random mix of an equal number A and B characters, LZ4 won't be able to compress it significantly, while Zstd will compress it 8:1 converging to an encoding where a '1' bit is A, and a '0' bit is B.

1 comments

> To put it simplistically, if you have a file which is a (good) random mix of an equal number A and B characters, LZ4 won't be able to compress it significantly

I checked it. LZ4 is still reducing the size to half, no idea why half. So for 10 MB file it compresses to 5 MB.

Edit: checked with highest compression and it compresses 1MB file to 185KB. So what the parent wrote is false.

Yes, if I take the 8 combinations aaa, aab, aba etc and assign each of them a 9 bit codeword I replace each 24 bit sequence with a 9 bit sequence. So arithmetic coders have no problem with cases like this.
but LZ4 doesn't have a arithmetic coder, or any other statistical encoding - it's just matches and literals. Puzzling...