Hacker News new | ask | show | jobs
by crdrost 1748 days ago
It helps to work out an example, maybe.

So let's try to roll a 1d5. We generate a real number p uniformly in [0, 1). In terms of base-2 decimal notation we should return

    1, if p < 0.0011 0011 0011 0011...
    2, if p < 0.0110 0110 0110 0110...
    3, if p < 0.1001 1001 1001 1001...
    4, if p < 0.1100 1100 1100 1100...
    5, otherwise
These repetitions are conveniently 4 bits long and could be written in hexadecimal, though of course they don't have to be (a 1d7 repeats after 3 bits, a 1d11 repeats after 10 bits).

Well, there's a problem. Our computers don't have infinitely long precise decimals like this, instead we generate a finite stream of bits. But suppose we generate those streams in "chunks". Let's say we generate one byte b at a time, this says for the first byte,

    if b  < 0x33, return 1.
    if b == 0x33, we'll need another byte b2 to decide.
        if b2 < 0x33, return 1.
        if b2 == 0x33, repeat with a new b2
        if b2 > 0x33, return 2.
    if b  < 0x66, return 2.
    if b == 0x66, we'll need another byte b2 to decide.
        if b2 < 0x66, return 2.
        if b2 == 0x66, repeat with a new b2
        if b2 > 0x66, return 3.
    if b  < 0x99, return 3
    [ if b == 0x99, we'll need another byte ]
    if b  < 0xCC, return 4
    [ if b == 0xcc, we'll need another byte ]
    otherwise, return 5.
You can kind of call this rejection sampling but notice that the post-rejection loop has different bounds than the non-rejection loop, it only has a 1-in-256 chance of repeating and not the 1-in-64 chance of rejection sampling.

I want to say that the probability of observing something hinky using K samples of a 1dN die with 2^B bits is not as simple as sqrt(K)_= 2^B but something more pessimistic like maybe K = 2^(B+1) / N^4 or so? So if you're generating histograms with 2^32 iterations of a million-sided die you actually don't have a huge safety margin here, 2^130 / 2^112 is only one in 2^18 or so, one in 250,000 runs could maybe correctly detect the bias. But it's still presumably "good enough" for all that.

1 comments

Thanks for this explanation, it made it click for me. The benefit of doing far less additional sampling in the rejection case is very intuitive, especially when the numbers are very large. It also seems this technique could be extended to other scenarios, such as 1/pi and other weird non-uniform bucket sizes.