Hacker News new | ask | show | jobs
by fanf2 960 days ago
For more on this topic see https://news.ycombinator.com/item?id=37926005
1 comments

That article makes the same unstated assumption, attempting to treat `random()`'s output (and transformations thereof) as true random variables. But it just changes the rounding mode and bin size in a manner that _happens_ to work better for some transforms like log.

It changes the fundamental property of the distribution — the cdf — without stating it.

With floating point realizations of [0, 1) intervals:

   cdf(p) = P(random() < p) := p
With (0, 1] intervals:

   cdf(p) = P(random() <= p) := p
Using your cdf framing, while there is a point about [0, 1) versus (0, 1] intervals, the bigger point of the article is about whether said cdf holds for any IEEE-754 double-precision p, or whether it only holds for p of the form i*2^-53 (for integer i).
Oh I completely agree there's value in improving the accuracy of the cdf, especially for 32-bit float! There it can be quite a surprise that tests like `random() < x` can be off by as much as .6% for values of x on the order of 1f-5 and 6% for values on the order of 1f-6, even though they are an order of magnitude or two over `eps(1f0)`.

Following your post, I've found the following fast and straightforward and SIMD-friendly implementation that uses all 64 bits for a [0, 1) distribution:

    ```julia
    function random_float(rng)
        r = rand(rng, UInt64)
        last_bit = r & -r
        exponent = UInt64(2045)<<52 - reinterpret(UInt64, Float64(last_bit))
        exponent *= !iszero(r)
        fraction = ((r ⊻ last_bit)>>(8*sizeof(UInt64) - 52)) % UInt64
        return exponent | fraction
    end
    ```