|
It helps to work out an example, maybe. So let's try to roll a 1d5. We generate a real number p uniformly in [0, 1). In terms of base-2 decimal notation we should return 1, if p < 0.0011 0011 0011 0011...
2, if p < 0.0110 0110 0110 0110...
3, if p < 0.1001 1001 1001 1001...
4, if p < 0.1100 1100 1100 1100...
5, otherwise
These repetitions are conveniently 4 bits long and could be written in hexadecimal, though of course they don't have to be (a 1d7 repeats after 3 bits, a 1d11 repeats after 10 bits).Well, there's a problem. Our computers don't have infinitely long precise decimals like this, instead we generate a finite stream of bits. But suppose we generate those streams in "chunks". Let's say we generate one byte b at a time, this says for the first byte, if b < 0x33, return 1.
if b == 0x33, we'll need another byte b2 to decide.
if b2 < 0x33, return 1.
if b2 == 0x33, repeat with a new b2
if b2 > 0x33, return 2.
if b < 0x66, return 2.
if b == 0x66, we'll need another byte b2 to decide.
if b2 < 0x66, return 2.
if b2 == 0x66, repeat with a new b2
if b2 > 0x66, return 3.
if b < 0x99, return 3
[ if b == 0x99, we'll need another byte ]
if b < 0xCC, return 4
[ if b == 0xcc, we'll need another byte ]
otherwise, return 5.
You can kind of call this rejection sampling but notice that the post-rejection loop has different bounds than the non-rejection loop, it only has a 1-in-256 chance of repeating and not the 1-in-64 chance of rejection sampling.I want to say that the probability of observing something hinky using K samples of a 1dN die with 2^B bits is not as simple as sqrt(K)_= 2^B but something more pessimistic like maybe K = 2^(B+1) / N^4 or so? So if you're generating histograms with 2^32 iterations of a million-sided die you actually don't have a huge safety margin here, 2^130 / 2^112 is only one in 2^18 or so, one in 250,000 runs could maybe correctly detect the bias. But it's still presumably "good enough" for all that. |