| The author has built the following encoder: - A genie hands the encoder the type of an iid source sequence (the function N here: [1]) - The encoder produces a representation of the source with fewer bits than what would be possible using a Shannon-efficient encoder that didn't know the type. - You also need to know the type to decode. The reduction in bits from Shannon's theorem is explained by the genie revealing this extra information that reduced the input sequence's entropy (also causing it to become non-iid).
The collection of letter-typical sequences of specific type is indeed often going to be smaller than (entropy-rate)^(blocklength), and that is what we are seeing here.
This is where the author's explanation starts to go astray: > The Shannon limit uses Stirling's approximation which is an asymptotic approximation for factorials and as such the approximation is too large for small values and becomes more accurate as the factorial size (i.e. message length and symbol count) approaches infinity. [1]: https://en.wikipedia.org/wiki/Typical_set#Strongly_typical_s... I would recommend the author read any information theory textbook, starting at the latest, at the part that covers Asymptotic Equipartition Properties. This is the crucial definition that will make the source coding theorem concrete, in my opinion. |