Hacker News new | ask | show | jobs
by AnotherGoodName 721 days ago
>As an example, given the symbol frequencies of 1, 2, and 3 for a total of 6 symbols. Shannon limit [-(1/6)log2(1/6)-(2/6)log2(2/6)-(3/6)log2(3/6)]6 = 8.75 => 9 bits

This isn't how you calculate Shannon limit fwiw. If all symbols are of equal frequency simply take the total number of different symbols and simply log2(different_symbol_count). That's it.

So say you had 60 different permutations. Each of those is a symbol in entropy encoding. Each individual symbol takes log2(60) bits to store using arithmetic encoding. Which is exactly the statement and correct calculation you had for calculating the Shannon limit in the very next line :). As in the Shannon limit for 60 different symbols is absolutely 5.9bits not 9bits

Instead you have some weird calculation here that seems to take individual probabilities and add them back up again to give 9bits. That's very far off and incorrect.

1 comments

Thanks I'll correct, I didn't know you could adjust the symbols as you go with arithmetic coding, got an example of that?
I think you could just have a different code for each given index and all the symbols seen until that point.

This seems to do something similar: https://en.wikipedia.org/wiki/Context-adaptive_binary_arithm...

fwiw I was using the same math they did here to get 4.755 bits: https://en.wikipedia.org/wiki/Arithmetic_coding#Sources_of_i... [-(1/3)log2(1/3)-(1/3)log2(1/3)-(1/3)log2(1/3)]*3=4.7548
I think you are confused about probabilities and realized frequencies. A probability is a before-the-fact model. A realized frequency is an after the fact observation. If you flip a fair coin the probability of heads or tails is 50% but any given single flip will have a frequency of 1 for either heads or tails and 0 for the other. (What is true is that as you flip the coin a very large number of times the actual ratio you will see will approach 50/50 but that's definitely not true for say 8 tosses. This is called the asymptotic equipartition principle but it isn't really relevant here).