Hacker News new | ask | show | jobs
by nerdponx 3287 days ago
It's amazing how something useful and innovative gets invented, and isn't so much as documented, let alone patented or published in a journal.
3 comments

LZ4 is only one of the many variations on the basic principle of the LZ family of algorithms, which is to replace repeated sequences with references to where they were before.

What does amaze me a little, is the fact that the rather more complex Huffman algorithm was published and implemented decades before LZ.

The core idea of LZ, however, has been known for centuries: https://en.wikipedia.org/wiki/Iteration_mark

This is true, there is low algorithmic novelty in lz4.

However, that's missing the point: lz4's decompressor is simultaneously simple and also the fastest thing aroud, at least at the moment.

In fact, it is so fast, it can be used to accelerate local data transfers over "slow" links like bonded 10GBit ethernet and arrays of PCIe SSD's.

> lz4's decompressor is simultaneously simple and also the fastest thing aroud, at least at the moment

The proprietary LzTurbo [0] and free Lizard [1] both claim to be faster at decompression while having better compression ratios.

[0] https://github.com/powturbo/TurboBench

[1] https://github.com/inikep/lizard

Doesn't that also explain why it came later? The low compression ratio is useful nowadays, with large throughputs.
Maybe. It might just be that the tricks it plays matter a lot on newer CPUs, and older fast compressors played older tricks.

I think the ratio of CPU cycles to i/o bandwidth is what really matters. Presumably the optimal tradeoff between CPU throughout and compression ratios depends on that and varies over time.

The low compression ratio / fast speed trade-off has always been valuable.

For example, block-level filesystem compression (what lz4 is used for with zfs) has been valuable for filesystems essentially forever.

There's always been a need for very high throughput on the local filesystem.

Iteration mark is more similar to RLE than to LZ. LZ can copy from arbitrary points in the past.
LZ4 deserves a lot of respect, but it is merely a recent improvement on a very old idea.

It wasn't novel when Lempel and Ziv described it 1977 - the encoding idea itself is almost trivial, and was described before. However, they did prove the conditions under which this compression is asymptotically optimal, which was NOT at all clear or trivial at the time - and it is therefore named after them.

LZ4 is an implementation of the LZ77 idea that optimized run time first and compression ratio second. It is elegant and successful - but it has little novelty.

The author has actually written a lot of great material on his blog: https://fastcompression.blogspot.ca/p/lz4.html?m=1

Not sure what you'd consider suitably documented beyond what is already out there. Not being patented or published in a journal look like positives to me.