| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bloaf 2216 days ago
	I agree. I've seen some marvelous pigeon hole explanations of why you cannot write a general lossless compression algorithm that makes all files smaller. I have no idea what that explanation would look like in terms of maxes and averages, and I have to believe it would be much more difficult to understand.

3 comments

kd0amg 2215 days ago

Yes, talking about averages sounds like it might lead a prover astray into reasoning about the average size of compressed data. The average that matters here though is how many inputs map to the same output.

Compressing (n+1)-bit input to n-bit output makes the average output correspond to 2^(n+1)/2^n = 2 inputs. By Dijkstra's rephrasing of the pigeonhole principle, the maximum number of inputs colliding on the same output must be at least 2.

Or we could make it more robust and say we're compressing to n-bit-or-smaller output. Then the average output corresponds to 2^(n+1)/(2^(n+1)-1) inputs. That average is still strictly greater than 1, so the maximum number of inputs colliding on the same output must also be strictly greater than 1 (at least 2, since that's the smallest natural which is greater than 1).

link

atq2119 2216 days ago

That's really more about surjectivity and simple counting. Neither the pigeon hole principle not a max/average argument are really necessary.

Say you're trying to find a way to compress every n+1 bit string into an n bit string. Losslessness means that you have to come up with a function d from n bit string to n+1 bit string that decompresses. The number of n+1 bit string is larger, so there must be n+1 bit strings that are not in the image of d, which means that such a function d cannot exist.

link

morelisp 2216 days ago

This is still the pigeonhole principle, just inverted - If you have more holes than pigeons (and you can’t make fractional or quantum pigeons) at least one hole must have no pigeons.

link

atq2119 2215 days ago

It's similar, but that's not usually referred to as the pigeon hole principle.

link

morelisp 2215 days ago

It's more than similar, it's gone from talking about the necessary properties of a function with an image smaller than its domain, to the impossibility of an inverse function as the image cannot be larger than its domain.

We might as well call this "the whole-pigeon principle."

link

phaemon 2215 days ago

A "file" on a computer is just a number.

When you compress a file, you are representing a big number with a small number.

There are more big numbers than there are small numbers.

link

Dylan16807 2215 days ago

You've just expressed part of the problem with numbers, you haven't tied them into averages and minimums/maximums.

link

phaemon 2215 days ago

Why is that required?

link

Dylan16807 2215 days ago

The prompt you were responding to was "I have no idea what that explanation would look like in terms of maxes and averages, and I have to believe it would be much more difficult to understand."

Specifically, they were looking for how you write the can't-always-compress proof using formulation (1)

link

phaemon 2214 days ago

Oh right. I read it as "how else could you explain it simply". I always thought the pigeon hole principal was a needless metaphor.

link