Hacker News new | ask | show | jobs
by canjobear 2165 days ago
You might want to give more conditions for the claim that the self-information gives the length of the shortest possible code.

In particular the condition isn't only that it's the shortest code, it's the shortest self-delimiting code. In your example with probabilities {1/2, 1/4, 1/8, 1/8}, someone could come in and say let's code it as {0,1,01,00}, which would appear to encode the latter two outcomes 2 bits rather than 3. The problem, of course, is that {0,1,01,00} is not a self-delimiting code: after you receive the bit 0, you don't know if you're done or if you should wait for another bit to form 01. But the code {0,10,110,111} is self-delimiting, because after you get a 0 or a 10, etc., you know you're done.

I've found that when I teach this material, if I don't mention the self-delimiting condition, then a clever student in the class always thinks of the {0,1,01,00}-type code. (This can be a good way to identify clever students in an intro information theory class!)