Hacker News new | ask | show | jobs
by j_not_j 379 days ago
Downey published a (big-endian, SPARC) method in 2007:

https://allendowney.com/research/rand/

And Lemire in 2017 asked the question

https://lemire.me/blog/2017/02/28/how-many-floating-point-nu...

The basic approach: collect top 60 bits of a 64-bit PRNG. (Assume the LSBs are corrupt or zero or nonrandom). Set exponent zero. If the top mantissa bit is zero, shift left, subtract one from exponent. Repeat. When you run out of bits, collect another PRNG from your generator and resume until you have 56 bits of mantissa. When done, your floating-point PRNG is 2^exponent * mantissa.

My short explanation: Suppose you want a random distance. Multiplying a PR integer is the same as selecting a random tile in the distance, and measuring the distance to the (far) edge of the tile. This is the multiply method.

But if you want a real-valued distance even for very near distances, you need to scale your random number and ensure you have random bits throughout the mantissa. So reduce the exponent for every leading zero in your PR integer and shift. Add more bits if needed.

Test: the histogram of exponents for a large-enough set of samples is linear, give or take. Very small floating numbers (less than say 1/32768) are 1/16 as likely as numbers in [0.5,1).

2 comments

Now that I've actually looked at the "utl::random" code in the OP, I see that its UniformRealDistribution is a wrapper around std::generate_canonical, so the juicy bits about turning a random into into a random float are not exposed here at all. But the utl::random code does include an pointer* to an informative C++ working group note.

* https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p09...

Thanks for the additional pointers!

I like the idea of looking at the histogram of exponents: incrementing (by 1) the exponent doubles the width of the interval so should double the number of hits. Conversely the histogram of the significand (or any subset of its bits) should be flat.

Beebe publishes a lengthy bibliography.

https://www.netlib.org/tex/bib/prng.pdf

758 pages and counting....

Lengthy indeed. Do you have advice for how to find amongst these the first paper on correctly sampling [0,1) with real-world floating point?