Hacker News new | ask | show | jobs
by _wldu 1380 days ago
One example I like to use (when talking about entropy):

A four digit numeric PIN (that we know) has 0 bits of entropy. There is no uncertainty about what the PIN actually is. A randomly selected one (that we do not know) has just over 13 bits.

print(math.log(10)/math.log(2)*4)

13.28771237954945

The more entropy, the more uncertain we are.

However, humans are not random. We use the year we were born, some keyboard sequence or some other predictable number as our PIN. We don't know exactly how much entropy these PINS have (there is some degree of uncertainty), but we do know they are significantly less than 13 bits.

2 comments

I wrote a blog post [1] with an interactive widget where you can provide an encoding for a random decimal digit and see how close you can get to the theoretical log₂(10) ≈ 3.32 bits.

[1]: https://blog.kardas.org/post/entropy/ (Average Code Length section)

Here's a riddle.

If using my birthday reduces the entropy of my PIN, what does it do to its entropy if I happen to have the same birthday as one of the most famous people in the world? Does it matter if I am aware or not? Does it matter what they use for their PIN?

For the sake of argument, I'm thinking month and day, not year.

The important thing is that your PIN has zero entropy, regardless of its value. Entropy is a property of distributions, not individual values. You may be thinking of the probability (or information content) your PIN is assigned when looking at the overally distributions of PIN, in which case it probably does matter how popular your birthday is (and whether it also matches common patterns people use for PINs). This does feed into the calculation of entropy for the distribution but then it ceases to tell you anything about your PIN specifically. It also only makes sense when you are looking at it relative to the distribution, so it matters how you specify the PINs you are comparing it to.

The 'information content' of a given outcome is the logarithm of the inverse of its probability (i.e. more unlikely events give you more information), and the entropy of a distribution is the expected value of this information content.

That is not a riddle just a question how to handle information on the distribution of passwords in the population. So you get the same answer as if it were four alphabetic characters and your choice is "soup".