In the "For a more concrete example" paragraph, since we're assuming that each byte of 8 bits must have 4 zeros and 4 ones, we always know the 8'th bit after only seeing the first 7. Indeed, many times we would know the 7th and 8th bits after seeing the first 6. So this source is not IID (at the bit-by-bit level) and Shannon's result does not apply there.
As other commenters have said, the source could be IID at the byte level. As noted, it would have 8-choose-4 = 70 symbols. And then Shannon's results would apply to the bytes, not the bits.
The article doesn't give enough information to say whether the source they have in mind is or isn't IID at the byte level. But the obvious choice (all 8-choose-4 symbols are equiprobable) does allow us to compute the Shannon entropy, which is of course always:
H = - E p(i) log(p(i))
where p(i) is the probability of the i'th symbol, which is always just
p(i) = 1/(8 choose 4) = 1/70
Of course, then
H = log(70)
So unsurprisingly the Shannon entropy gives the right answer.
*
Cover and Thomas is really good and intuitive on this. See section 3.2 of:
The thing that's wild is that, in the asymptotic case, "everything is in the typical set". The OP is kind of riffing on this, taking it very literally!
Whenever you're compressing a file on your computer the frequencies are known beforehand. Agree this does not apply to noisy-channel coding, only the Shannon source coding theorem in lossless compression. Good point though I'm not sure if this would be considered an entropy encoder, as previous values do have some impact. Any more info I should look at for that?
Got it, I was thinking of the locations as being i.i.d. in that you do not have any information on their location. You and others are saying that adjusting the symbol counts as you go would give the same performance to other algorithms. Looking into that, thanks.
In the "For a more concrete example" paragraph, since we're assuming that each byte of 8 bits must have 4 zeros and 4 ones, we always know the 8'th bit after only seeing the first 7. Indeed, many times we would know the 7th and 8th bits after seeing the first 6. So this source is not IID (at the bit-by-bit level) and Shannon's result does not apply there.
As other commenters have said, the source could be IID at the byte level. As noted, it would have 8-choose-4 = 70 symbols. And then Shannon's results would apply to the bytes, not the bits.
The article doesn't give enough information to say whether the source they have in mind is or isn't IID at the byte level. But the obvious choice (all 8-choose-4 symbols are equiprobable) does allow us to compute the Shannon entropy, which is of course always:
where p(i) is the probability of the i'th symbol, which is always just Of course, then So unsurprisingly the Shannon entropy gives the right answer.*
Cover and Thomas is really good and intuitive on this. See section 3.2 of:
https://cs-114.org/wp-content/uploads/2015/01/Elements_of_In...
The thing that's wild is that, in the asymptotic case, "everything is in the typical set". The OP is kind of riffing on this, taking it very literally!