Hacker News new | ask | show | jobs
by danger 5717 days ago
How is that related to the SO question? The post is asking about how to tell if one distribution is "more random" than another, which is what entropy is all about.
1 comments

Consider a random number generator that produces integers between 0 and 15. Here's a really crappy algorithm:

1. Start with a truly random seed between 0 and 15.

2. Increment it each time to generate a random number, modulo 16.

Suppose you start at 11. Your sequence of random numbers will be 11, 12, 13, 14, 15, 1, 2, 3 ....

This is obviously not very random. But look at the entropy of it. All values between 0 and 15 are equally likely, so this will have the maximum possible entropy: 4 bits.

The problem here is that, for entropy to be an accurate measurement of information content, you have to assume that you're measuring independent identically distributed random variables. The outputs from a pseudo-random number generator are not independent.

If we are interested in a _sequence_ of pseudorandom numbers, then we should be talking about the _joint_ entropy of the sequence. But information theory is plenty capable of describing "how random" a sequence is.

The generator you are talking about is far from producing a uniform distribution (which would have maximum entropy) jointly over the n-dimensional hypercube that would represent a sequence of n draws from your generator, so it has less entropy and is thus "less random" than the distribution you'd get from n independent draws from a perfect rand().

You're right (have an upvote); I was warning more against rash misuse of the concept of entropy, which I see quite a bit.