| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by virexene 960 days ago
	“the 2⁵² numbers between 0 and 1 that double precision floating point can represent” it's a bit of a nitpick, but i believe there are 1023×2⁵² such numbers, which is quite a bit more. there are 2⁵² double precision floats in just [0.5;1)!

2 comments

mbauman 960 days ago

That's not a nitpick at all — it has huge ramifications for how `Math.random()` needs to be interpreted. Many folks want it (and transformations of it!) to behave as though it were a true random variable sampled from a theoretically ideal [0, 1) uniform distribution.

But it's not, and will never be. It can only ever return values that have been rounded down to some floating point number. Even if you realize this, you still might expect it to be able to return _any_ floating point number x∈[0, 1) with `eps(x)` probability. But that's not typically the case, either. Typical implementations round down to the previous multiple of `eps(1.0)` or `eps(0.5)`.

link

fanf2 960 days ago

For more on this topic see https://news.ycombinator.com/item?id=37926005

link

mbauman 960 days ago

That article makes the same unstated assumption, attempting to treat `random()`'s output (and transformations thereof) as true random variables. But it just changes the rounding mode and bin size in a manner that _happens_ to work better for some transforms like log.

It changes the fundamental property of the distribution — the cdf — without stating it.

With floating point realizations of [0, 1) intervals:

   cdf(p) = P(random() < p) := p

With (0, 1] intervals:

   cdf(p) = P(random() <= p) := p

link

corsix 960 days ago

Using your cdf framing, while there is a point about [0, 1) versus (0, 1] intervals, the bigger point of the article is about whether said cdf holds for any IEEE-754 double-precision p, or whether it only holds for p of the form i*2^-53 (for integer i).

link

mbauman 960 days ago

Oh I completely agree there's value in improving the accuracy of the cdf, especially for 32-bit float! There it can be quite a surprise that tests like `random() < x` can be off by as much as .6% for values of x on the order of 1f-5 and 6% for values on the order of 1f-6, even though they are an order of magnitude or two over `eps(1f0)`.

Following your post, I've found the following fast and straightforward and SIMD-friendly implementation that uses all 64 bits for a [0, 1) distribution:

    ```julia
    function random_float(rng)
        r = rand(rng, UInt64)
        last_bit = r & -r
        exponent = UInt64(2045)<<52 - reinterpret(UInt64, Float64(last_bit))
        exponent *= !iszero(r)
        fraction = ((r ⊻ last_bit)>>(8*sizeof(UInt64) - 52)) % UInt64
        return exponent | fraction
    end
    ```

link

fluoridation 960 days ago

You are correct, however picking floats uniformly like that is much more difficult, as some values should be thousands of orders of magnitude more likely to be picked than others. It's much easier to implement a generator that randomizes the mantissa and calls it a return.

link

fanf2 960 days ago

My favourite implementation is

    double random_double(rng_t *rng) {
        return((double)(random_uint64(rng) >> 11) * 0x1.0p-53);
    }

It has the advantage of not needing bit_cast (which C lacks) and has 53 instead of 52 bits of randomness.

link

smegsicle 960 days ago

i for one expect my random floating point numbers to have the same distribution as a random bitfield interpreted as a float, and anything else should be considered a bug

link

fanf2 960 days ago

I don’t think that would be very useful, because there would be a 1/2048 chance of getting a NaN, and it would be difficult to convert the numbers to a known distribution.

link