Hacker News new | ask | show | jobs
by gzalo 721 days ago
If the symbol frequency is known beforehand and fixed for all messages, it's not really i.i.d, as each distribution depends on the previous symbols.
3 comments

Agree.

In the "For a more concrete example" paragraph, since we're assuming that each byte of 8 bits must have 4 zeros and 4 ones, we always know the 8'th bit after only seeing the first 7. Indeed, many times we would know the 7th and 8th bits after seeing the first 6. So this source is not IID (at the bit-by-bit level) and Shannon's result does not apply there.

As other commenters have said, the source could be IID at the byte level. As noted, it would have 8-choose-4 = 70 symbols. And then Shannon's results would apply to the bytes, not the bits.

The article doesn't give enough information to say whether the source they have in mind is or isn't IID at the byte level. But the obvious choice (all 8-choose-4 symbols are equiprobable) does allow us to compute the Shannon entropy, which is of course always:

  H = - E p(i) log(p(i))
where p(i) is the probability of the i'th symbol, which is always just

  p(i) = 1/(8 choose 4) = 1/70
Of course, then

  H = log(70)
So unsurprisingly the Shannon entropy gives the right answer.

*

Cover and Thomas is really good and intuitive on this. See section 3.2 of:

https://cs-114.org/wp-content/uploads/2015/01/Elements_of_In...

The thing that's wild is that, in the asymptotic case, "everything is in the typical set". The OP is kind of riffing on this, taking it very literally!

Whenever you're compressing a file on your computer the frequencies are known beforehand. Agree this does not apply to noisy-channel coding, only the Shannon source coding theorem in lossless compression. Good point though I'm not sure if this would be considered an entropy encoder, as previous values do have some impact. Any more info I should look at for that?
> Whenever you're compressing a file on your computer the frequencies are known beforehand.

Not at all. The best compressors learn to predict the frequencies as they go, they do not compute them up front and store them in the output.

You still have to store that info and that counts against the number of bytes you are using
Got it, I was thinking of the locations as being i.i.d. in that you do not have any information on their location. You and others are saying that adjusting the symbol counts as you go would give the same performance to other algorithms. Looking into that, thanks.