| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by red369 1040 days ago

>Every lossless “compression” algorithm makes some strings shorter and other strings longer, such that the average space savings against all possible strings is neutral at best. This is a fundamental consequence of the way information works — there’s no way to losslessly map (say) 8 bits of input into 7 bits of output.

I think I'm missing a point here. Is it a provable fact that all lossless compression algorithms are neutral at best? Or just something that occurs in the real world?

I can't think of a good example, so I'll give a trivial example to explain my thinking.

It seems to me that you could have an algorithm which did not change any text, except the sentence "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo", which it encodes as something much shorter like "<Buffalo*8>". The algorithm would be (very marginally) better than neutral. If this is true, then surely other improvements would be possible to make a lossless algorithm with real reduction.

(Edited a typo)

2 comments

tarvaina 1040 days ago

The impossibility of a general lossless compression algorithm is a consequence of the pigeon-hole principle.

https://en.m.wikipedia.org/wiki/Pigeonhole_principle

link

coryrc 1040 days ago

Then how do you encode an input of "<Buffalo*8>"? If your answer is "I'm using characters outside the alphabet" then, there's your answer, you're wasting bits on them.

link