Hacker News new | ask | show | jobs
by pjscott 5718 days ago
Consider a random number generator that produces integers between 0 and 15. Here's a really crappy algorithm:

1. Start with a truly random seed between 0 and 15.

2. Increment it each time to generate a random number, modulo 16.

Suppose you start at 11. Your sequence of random numbers will be 11, 12, 13, 14, 15, 1, 2, 3 ....

This is obviously not very random. But look at the entropy of it. All values between 0 and 15 are equally likely, so this will have the maximum possible entropy: 4 bits.

The problem here is that, for entropy to be an accurate measurement of information content, you have to assume that you're measuring independent identically distributed random variables. The outputs from a pseudo-random number generator are not independent.

1 comments

If we are interested in a _sequence_ of pseudorandom numbers, then we should be talking about the _joint_ entropy of the sequence. But information theory is plenty capable of describing "how random" a sequence is.

The generator you are talking about is far from producing a uniform distribution (which would have maximum entropy) jointly over the n-dimensional hypercube that would represent a sequence of n draws from your generator, so it has less entropy and is thus "less random" than the distribution you'd get from n independent draws from a perfect rand().

You're right (have an upvote); I was warning more against rash misuse of the concept of entropy, which I see quite a bit.