| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by olliej 2766 days ago

No, because if I receiver does not know the starting state of the compression algorithm, no matter what the algorithm is, then it cannot decompress.

Basically the trained dictionary can be thought of as a generic "context" for a compression algorithm. The process of compressing a symbol in any compression algorithm can be summed up with a function: compress(Context, Symbol) -> (New context, Bits[1])

Decompression is always then decompress_symbol(Context, Stream) -> (New context, symbol).

The important thing is that to be able to decompress a symbol, the decompression engine needs to know the exact state of the compression algorithm at each point. It should be obvious why this is necessary. if it's difficult to see why, imagine your entire compression algorithm is trivial: allocate a number to each word in a dictionary, and your compression algorithm is simply to replace each word in the input with the assigned number, it's then obvious that the dictionary the decompression engine uses has to be identical. This is a simplification, but the same logic applies to every compression algorithm. Even a static Huffman table for instance has this semantic - the result of compress(context, symbol) is going to have the same context, but that requires transmission of the static table before any decompression happens.

The illogical extreme for an algorithm is to include a specific entry for specific inputs - for example the "honest" algorithm at https://nerget.com/compression/

1. Note that for some algorithm Bits may technically be a whole number of bits