Hacker News new | ask | show | jobs
by TimonKnigge 1748 days ago
I see -- thank you for clarifying. So for a formally uniform distribution, is rejection sampling your only option ?
2 comments

Not at all! You can simply compute additional product bits (you don’t need to go bit-by-bit) until the “can this possibly carry out” test (which is just a comparison) says “no”. The probability that you need more bits falls off exponentially, so this happens very quickly.

What you lose by doing this is each value no longer consumes a bounded number of bits from the source, which is how it’s able to be “truly” uniform.

Apologies if you already knew this:

It feels as if this is still related to rejection sampling, in spite of not being quite that. If the algorithm samples some bits, and still hasn't made a decision, then it's as if all the previous samples have been "rejected". The difference seems to be that this algorithm has a memory of previously rejected samples, which causes it to be less likely to reject subsequent samples. "Adaptive rejection sampling" appears to be relevant: https://en.wikipedia.org/wiki/Rejection_sampling#Adaptive_re...

Yes, although the algorithm here just accepts a certain small amount of bias rather than reject. But the "corrected" version of this says that if I am rolling a 1dN the first random byte will have N-1 different "rejection" criteria, the next byte and every byte after will only have 1 "rejection" criterion, and so on.

Something that would be more interesting: if you had a random number primitive which could generate 1 with probability 1/2, 2 with probability 1/6, 3 with probability 1/12, and so on (p = 1/(N * (N+1))), that primitive would allow you to construct random irrational numbers uniformly sampled from [0, 1) as continued fractions. Since the rationals ℚ are a non-dense subset of ℝ, ignore them as measure-0.

Such a primitive would allow you to do a sort of rejection-sampling with only a finite number of rejections before you are guaranteed an answer, as a continued fraction for a rational has a finite expansion.

Q is a dense subset of R, but it's countable, so it has measure 0.
Nit: the rationals are dense in R.
It helps to work out an example, maybe.

So let's try to roll a 1d5. We generate a real number p uniformly in [0, 1). In terms of base-2 decimal notation we should return

    1, if p < 0.0011 0011 0011 0011...
    2, if p < 0.0110 0110 0110 0110...
    3, if p < 0.1001 1001 1001 1001...
    4, if p < 0.1100 1100 1100 1100...
    5, otherwise
These repetitions are conveniently 4 bits long and could be written in hexadecimal, though of course they don't have to be (a 1d7 repeats after 3 bits, a 1d11 repeats after 10 bits).

Well, there's a problem. Our computers don't have infinitely long precise decimals like this, instead we generate a finite stream of bits. But suppose we generate those streams in "chunks". Let's say we generate one byte b at a time, this says for the first byte,

    if b  < 0x33, return 1.
    if b == 0x33, we'll need another byte b2 to decide.
        if b2 < 0x33, return 1.
        if b2 == 0x33, repeat with a new b2
        if b2 > 0x33, return 2.
    if b  < 0x66, return 2.
    if b == 0x66, we'll need another byte b2 to decide.
        if b2 < 0x66, return 2.
        if b2 == 0x66, repeat with a new b2
        if b2 > 0x66, return 3.
    if b  < 0x99, return 3
    [ if b == 0x99, we'll need another byte ]
    if b  < 0xCC, return 4
    [ if b == 0xcc, we'll need another byte ]
    otherwise, return 5.
You can kind of call this rejection sampling but notice that the post-rejection loop has different bounds than the non-rejection loop, it only has a 1-in-256 chance of repeating and not the 1-in-64 chance of rejection sampling.

I want to say that the probability of observing something hinky using K samples of a 1dN die with 2^B bits is not as simple as sqrt(K)_= 2^B but something more pessimistic like maybe K = 2^(B+1) / N^4 or so? So if you're generating histograms with 2^32 iterations of a million-sided die you actually don't have a huge safety margin here, 2^130 / 2^112 is only one in 2^18 or so, one in 250,000 runs could maybe correctly detect the bias. But it's still presumably "good enough" for all that.

Thanks for this explanation, it made it click for me. The benefit of doing far less additional sampling in the rejection case is very intuitive, especially when the numbers are very large. It also seems this technique could be extended to other scenarios, such as 1/pi and other weird non-uniform bucket sizes.