Hacker News new | ask | show | jobs
by deepsun 1106 days ago
> selects the optimal algorithm

Here's the catch: how does the "system" know which algorithm would be the best? It could try encoding it with multiple algorithms and see which one is shorter, but that's extra CPU.

And the "system" can be called acompression algorithm itself.

3 comments

I mean yeah that's basically what high compression solutions like paq have done, depending on the compression level desired apply increasingly speculative and computationally intensive models to the block and pick whichever one worked the best.
And then, when nobody wants to implement all the compression algorithms in a compressor or decompressor, we end up with files out in the wild that only pick one of them anyway.
OpenZFS' integration of Zstandard uses LZ4 as a "compression canary" for higher ZStandard compression levels, where they feed the data blocks through LZ4 and if it compresses it enough, feeds it through Zstandard.

This relies on LZ4 being very fast, especially with it's early-exit on incompressible data.

Overall this turns out to be a win, you lose a little bit of compression at a huge decrease in CPU over just using the same Zstandard compression for all the blocks.

It would still pay off in many situations. With existing algorithms you can already optimize image files to be compressed as much as possible, which takes quite a bit longer than usual but if it means 30% smaller files for an entire website that has an impact on every visitor.