| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mistercow 5047 days ago

>Why is this an invalid approach?

Because of information theory. It's believed that the binary expansion of pi contains all possible finite bit sequences. A program that expands pi is relatively small. Assuming that hypothesis is true, does that mean we could write a compressor that simply breaks input into blocks and then encodes the output as indexes into pi?

And the answer is that we certainly could write that program. And the result would almost always be a larger output than input. Why? Because you'd have to search so far into pi that the index would contain more bits than the input blocks. In short, having a library of arbitrary strings to draw from does not help you to compress data.

More generally, compression is impossible in the general case. Every lossless compression algorithm will, on average, produce an equal or larger output than its input.