Hacker News new | ask | show | jobs
by leiroigh 982 days ago
This problem is much more acute in Float32.

Another sample implementation (maybe easier to read?) is in https://discourse.julialang.org/t/output-distribution-of-ran...

As far as I remember, the main reasons for not rocking the boat on that in julia were: Everybody generates rand floats wrong, so doing it right breaks expectations and compat; and there is a perf price of maybe 0.5-1.5 cycles/number to pay; and this is nontrivial to SIMD (basically because we don't have simd TZCNT / LZCNT / access to exponential distribution)

3 comments

> the main reasons for not rocking the boat on that in julia were: Everybody generates rand floats wrong, so doing it right breaks expectations and compat

Ugh, I’m sorry they had to deal with this and am sorry with the (understandable) choice they made.

What kind of things are relying on random floats being generated wrong?
Comparability between implementations: Say GPU vs CPU, or between languages, or to pseudocode in old papers.

Typical example where the badness of the floats bites you is if you do something like log(rand()), or 1/x, or more generally you map your uniform (0,1] interval via a cdf to generate a different distribution, and your mapping has a singularity at zero (this is like super standard -- e.g. how you generate exponentially distributed numbers. Or if you generate random vectors in an N-d ball, you're using a singular cdf to compute the magnitude your vector, multiplied with a normalized normal distributed N-d variable to generate the angular component).

Then the bad random floats (i.e. the non-smooth distribution close to zero) introduce real and visible bias after transformation. Afaiu the problem is well-known and serious people correct for it somewhere. If you fix the underlying problem without revisiting the now-obsolete fixes, then your results are biased again.

I'm not arguing for keeping the bad random float generation everywhere. I think it should be fixed, not just in julia but everywhere. I'm just saying that it's not a no-brainer and there is discussion and compat and communication and code audit involved in doing such a momentuous change.

Also, I'm not really up to putting in the work to champion that at the moment (arguing on the internet doesn't count ;)).

Nice. I am not surprised that the approach presented here was discussed in Julia circles already some 5 years ago. So it was about half as fast, if I interpret the thread correctly.
That thread was in 0.6 days on my long-dead broadwell using DSFMT as a C library with afaiu hand-written intrinsic code for bulk generation of floats. We switched RNG to xoshiro in the meantime which is faster and generates 64 bit numbers natively (as opposed to dsfmt which generated 53 bit natively...). So don't trust that these old timings represent current julia; I updated the thread with current timings.

I'd be very happy if this can of worms could be reopened, but am currently not active enough in julia dev to champion it.

Also somebody would need figure out something very clever for AVX2 / NEON. (AVX512 has the required instructions)

Also I can't imagine the mess with GPU -- if rand statistics differ widely between CPU and GPU that's a no-go, and I don't know what works well on which GPUs.