Hacker News new | ask | show | jobs
by verytrivial 1919 days ago
I think it's right regarding the limited horizon of the window. Any dynamic compression scheme that evolves knowledge of the input stream will have this property. The tar block being compressed here is many orders of magnitude larger than the window, so the order in which the compressor sees the stream matters.

It is somewhat more complicated because e.g. dynamic Huffan coding rarely applied to raw streams but to a series of one or more encoders such as run-length, but the reasoning remains that if the window or coded/dynamic dictionary size is much larger than the data stream, compression can improve with input permutation.

Back in the 90s I recall an unpopular but very efficient encoder called UltraCompressor[1] which had some input permutation heruristics, "solid" compression, and damage recovery to boot!

[1] http://www.compression.ru/arctest/self/uc.htm