Hacker News new | ask | show | jobs
by zzazzdsa 3644 days ago
There's a pretty easy proof to show why compression of truly random data is impossible. Let C be a compression algorithm. Assume that for all bit strings S, C(S) is not longer than S, and assume that there exists strings S' such that C(S') is shorter than S'. Also, assume that for any two distinct bit strings S1 and S2, C(S1) != C(S2). Let S'' be the shortest such string: the shortest string that C shortens. Let the length of S'' be M, and the length of C(S'') be N < M. Note N >= 1, and by assumption every string of length N does not have its length changed by C. There are 2^N strings of length N, and by assumption it is easy to see that since the images of any pair of N bit strings under C are distinct, every string of N bits is the image of some other N bit string. But now, S'' has length M but has an image of length N. This necessarily collides with the image of some string of length N, a contradiction. Thus, for any compression algorithm which is injective (in effect, reversible) and successfully shortens some strings, the algorithm must lengthen some others.

What this short proof shows is that even though random data may have patterns, no compression algorithm can successfully leverage this for every random string.