Hacker News new | ask | show | jobs
by virexene 960 days ago
“the 2⁵² numbers between 0 and 1 that double precision floating point can represent”

it's a bit of a nitpick, but i believe there are 1023×2⁵² such numbers, which is quite a bit more. there are 2⁵² double precision floats in just [0.5;1)!

2 comments

That's not a nitpick at all — it has huge ramifications for how `Math.random()` needs to be interpreted. Many folks want it (and transformations of it!) to behave as though it were a true random variable sampled from a theoretically ideal [0, 1) uniform distribution.

But it's not, and will never be. It can only ever return values that have been rounded down to some floating point number. Even if you realize this, you still might expect it to be able to return _any_ floating point number x∈[0, 1) with `eps(x)` probability. But that's not typically the case, either. Typical implementations round down to the previous multiple of `eps(1.0)` or `eps(0.5)`.

That article makes the same unstated assumption, attempting to treat `random()`'s output (and transformations thereof) as true random variables. But it just changes the rounding mode and bin size in a manner that _happens_ to work better for some transforms like log.

It changes the fundamental property of the distribution — the cdf — without stating it.

With floating point realizations of [0, 1) intervals:

   cdf(p) = P(random() < p) := p
With (0, 1] intervals:

   cdf(p) = P(random() <= p) := p
Using your cdf framing, while there is a point about [0, 1) versus (0, 1] intervals, the bigger point of the article is about whether said cdf holds for any IEEE-754 double-precision p, or whether it only holds for p of the form i*2^-53 (for integer i).
Oh I completely agree there's value in improving the accuracy of the cdf, especially for 32-bit float! There it can be quite a surprise that tests like `random() < x` can be off by as much as .6% for values of x on the order of 1f-5 and 6% for values on the order of 1f-6, even though they are an order of magnitude or two over `eps(1f0)`.

Following your post, I've found the following fast and straightforward and SIMD-friendly implementation that uses all 64 bits for a [0, 1) distribution:

    ```julia
    function random_float(rng)
        r = rand(rng, UInt64)
        last_bit = r & -r
        exponent = UInt64(2045)<<52 - reinterpret(UInt64, Float64(last_bit))
        exponent *= !iszero(r)
        fraction = ((r ⊻ last_bit)>>(8*sizeof(UInt64) - 52)) % UInt64
        return exponent | fraction
    end
    ```
You are correct, however picking floats uniformly like that is much more difficult, as some values should be thousands of orders of magnitude more likely to be picked than others. It's much easier to implement a generator that randomizes the mantissa and calls it a return.
My favourite implementation is

    double random_double(rng_t *rng) {
        return((double)(random_uint64(rng) >> 11) * 0x1.0p-53);
    }
It has the advantage of not needing bit_cast (which C lacks) and has 53 instead of 52 bits of randomness.
i for one expect my random floating point numbers to have the same distribution as a random bitfield interpreted as a float, and anything else should be considered a bug
I don’t think that would be very useful, because there would be a 1/2048 chance of getting a NaN, and it would be difficult to convert the numbers to a known distribution.