Hacker News new | ask | show | jobs
by kybernetikos 4342 days ago
By the nature of things, anything that compresses some input data must necessarily lengthen other input data, since you can't get away from the fact that there are only so many input files that can be represented by the output number of bits. In fact, it will almost certainly lengthen many more of the possible inputs than it shortens.

I once heard someone describe compression programs as 'expansion programs with interesting failure cases', and so, of course, the best compression program to use depends on exactly which failure cases you're interested in.

1 comments

While true this doesn't seem to be a practical issue. Any uncompressable data can be encoded using only one bit of overhead, where the bit is a flag indicating whether the rest of the data is compressed. In practice, there is a header and a field indicating which compression method to use. You pay for the size of the header. Adding support for another compression method is nearly free as far as space is concerned; one byte can switch between 256 of them. (Time is another matter.)