Hacker News new | ask | show | jobs
by fuzzy2 1927 days ago
That’s not the case here. What you describe is basically compression in “solid blocks”. `xz`, however, is (in practice virtually always) fully solid. It allows greater compression efficiency at the expense of random access, which is not possible at all.

The order matters because it “defines” what the algorithm’s dictionary will look like.

1 comments

I think it's right regarding the limited horizon of the window. Any dynamic compression scheme that evolves knowledge of the input stream will have this property. The tar block being compressed here is many orders of magnitude larger than the window, so the order in which the compressor sees the stream matters.

It is somewhat more complicated because e.g. dynamic Huffan coding rarely applied to raw streams but to a series of one or more encoders such as run-length, but the reasoning remains that if the window or coded/dynamic dictionary size is much larger than the data stream, compression can improve with input permutation.

Back in the 90s I recall an unpopular but very efficient encoder called UltraCompressor[1] which had some input permutation heruristics, "solid" compression, and damage recovery to boot!

[1] http://www.compression.ru/arctest/self/uc.htm