| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mbauman 960 days ago
	That's not a nitpick at all — it has huge ramifications for how `Math.random()` needs to be interpreted. Many folks want it (and transformations of it!) to behave as though it were a true random variable sampled from a theoretically ideal [0, 1) uniform distribution. But it's not, and will never be. It can only ever return values that have been rounded down to some floating point number. Even if you realize this, you still might expect it to be able to return _any_ floating point number x∈[0, 1) with `eps(x)` probability. But that's not typically the case, either. Typical implementations round down to the previous multiple of `eps(1.0)` or `eps(0.5)`.

1 comments

fanf2 960 days ago

For more on this topic see https://news.ycombinator.com/item?id=37926005

link

mbauman 960 days ago

That article makes the same unstated assumption, attempting to treat `random()`'s output (and transformations thereof) as true random variables. But it just changes the rounding mode and bin size in a manner that _happens_ to work better for some transforms like log.

It changes the fundamental property of the distribution — the cdf — without stating it.

With floating point realizations of [0, 1) intervals:

   cdf(p) = P(random() < p) := p

With (0, 1] intervals:

   cdf(p) = P(random() <= p) := p

link

corsix 960 days ago

Using your cdf framing, while there is a point about [0, 1) versus (0, 1] intervals, the bigger point of the article is about whether said cdf holds for any IEEE-754 double-precision p, or whether it only holds for p of the form i*2^-53 (for integer i).

link

mbauman 960 days ago

Oh I completely agree there's value in improving the accuracy of the cdf, especially for 32-bit float! There it can be quite a surprise that tests like `random() < x` can be off by as much as .6% for values of x on the order of 1f-5 and 6% for values on the order of 1f-6, even though they are an order of magnitude or two over `eps(1f0)`.

Following your post, I've found the following fast and straightforward and SIMD-friendly implementation that uses all 64 bits for a [0, 1) distribution:

    ```julia
    function random_float(rng)
        r = rand(rng, UInt64)
        last_bit = r & -r
        exponent = UInt64(2045)<<52 - reinterpret(UInt64, Float64(last_bit))
        exponent *= !iszero(r)
        fraction = ((r ⊻ last_bit)>>(8*sizeof(UInt64) - 52)) % UInt64
        return exponent | fraction
    end
    ```

link