Hacker News new | ask | show | jobs
by kbsletten 2687 days ago
So, am I correct that this would give you effectively 52 random bits instead of the maximum 53? Is the problem that the 53rd bit would mean there are possibly invalid values you'd have to throw out, or that the distribution wouldn't be uniform?
1 comments

The problem is that: a, there are over a thousand random bits to fill, and you'll need a significantly more sophisticated mechanism to fill them; b, if you need more than a dozen significant digits, 64-bits floats probably aren't what you want anyway; and c, clever tricks aren't the place for rigorous correctness when "correct to within rounding errors" will do just as well.

If you really want you could tack on:

  ret += (entropy >> 63) * 0x1.0p-53; // 1.110223024625156540e-16;
  OR
  ret += (entropy >> 52) * 0x1.0p-64; // 5.421010862427522170e-20;
right before the return to pack in some extra bits.
This reply made me even more confused. According to my naive understanding of floating point there are 53 bits for the mantessa (sp?) so in my head, you would specify a exponent of 2^(-53) and fill the mantessa with random bits to get a "pretty good" random number. If you wanted to represent every possible value of a double, then you have to vary the exponent and your distribution becomes non-uniform. I assume I'm missing something fairly obvious to others, but your value seems to be exactly one bit short of the "ideal" (if you call it that, but really it's just my imagination of a "perfect" random double). I see your reply mentions you can get back that bit, but I don't understand why it has to be it's own step. What limitation of doubles are you overcoming when you use the range 1-2 and then subtract one instead of just generating in the range 0-1?
For IEEE floats, the range [1, 2) has uniform absolute precision. Not so for [0, 1), which spans many values of the exponent.

There is uniform absolute precision on each interval among [0.5, 1), [0.25 0.5), [0.125, 0.25), &c., but as you get closer and closer to 0, there are still just as many fraction bits available (same relative precision), so these numbers are much more precise in absolute terms (i.e. relative to 1.0).

You can generate a random number in [1, 2) by just filling in the fraction part with random bits, because the exponent is always the same. Then you can subtract 1 to get a number in the range [0, 1). But for a number in e.g. the range [0.125, 0.25), the last 3 bits of the fraction will always be 0, meaning you are skipping 7/8 of the possible floating point numbers in that range.

The smallest possible denormal number among IEEE doubles is 2^(–1074), but the smallest non-zero value you will get with this method is 2^(–52).

If you want to generate any possible floating point number in the range [0,1) with probability proportional to the difference between it and the next representable number, it gets pretty tricky / computationally expensive.

This was pretty much my reasoning as well, but I missed that you can generate a floating-point integer in [0,2^53) and then scale down to [0,1) with a multiply, which gets you one additional significant bit for free or cheaper, and simplifys the code.

Edit: I think for the general case you want something like:

  uint64 x = rand64() & 0x001F'FFFF'FFFF'FFFF
  double xf = .5 * 1p-52 * x
  // return xf; /* 53 bits of entropy */
  if(xf < .5)
    {
    y = rand64() & 0x000F'FFFF'FFFF'FFFF
    double yf = .5 * 1p-52 * 1p-52 * y
    xf += yf
    if(xf < .5 * 1p-52) ... /* loop about 20 times */
    }
Possibly we could generate a random number at the subnormal range (with 0 exponent) then multiply by 0.99999~ divided by the greatest possible 0 exponent number.

  uint64 x = rand64()&0x000F'FFFF'FFFF'FFFF
  return *(double*)&x * 4.49423283715579e+307
I tweaked the multiplier as high as possible without producing 1, but there might be a fraction of a bit bias introduced by the multiplication compared to the method of subtracting 1 from 1-2 range.

edit: That multiplier seems to be essentially 2^1022, in which case there may be no quantization noise possible in multiplying by it. It may simply cause the exponent to be increased and the result normalised.

> What limitation of doubles are you overcoming when you use the range 1-2 and then subtract one instead of just generating in the range 0-1?

The limitation is that floating point numbers have different exponent depending on where in the range [0,1) they are, while every float in [1,2) has a exponent of 3FF. Generating a number in the latter range and subtracting 1.0 lets the floating point hardware do the work of figuring out the exponent.

On futher testing, though, it looks like you can just do ( rand53() * 1p-53 ), which will be hands down better as long as your floating-point hardware has a good int64->double conversion.