Hacker News new | ask | show | jobs
by a_wild_dandan 810 days ago
Doesn't the modern variation break for programs with lossless encoding/decoding? At least, for sufficiently long sequences? A Huffman/byte-pair encoding would shred any trillion-bit+ sequence, for instance. But I intuitively expect many random trillion-bit sequences exist.
2 comments

There is no encoding that would shred "any" (read: every) trillion bit sequence. If that were true, some fundamentals of information theory and compressibility would break down.

Lossless encoding works by taking advantage of the more commonly observed sequences of data having lower information entropy. For things like audio encoding, where discontinuous sequences aren't naturally observed (or pleasing to listen to), lossless encoding has a lot to work with.

For any fixed compression scheme, there is an input string that is actually lengthened by it rather than shortened.

However Huffman isn’t a fixed compression scheme since it makes a different frequency tree for different corpora.