Hacker News new | ask | show | jobs
by rurban 1878 days ago
MT is pretty slow. There are fast variants (SFMT), but why use that when can have much better and much faster rngs?

https://rurban.github.io/dieharder/QUALITY.html

2 comments

I don't know about all the other RNGs in this benchmark, but MT prepares a dense/contiguous array of random numbers. The whole computation can be SIMDified (often even by the compiler), but extraction is (unless changed explicitly) a single copy of a double.

If RNG speed matters and you need a lot of random numbers in succession (and I assume these two assumptions correlate strongly) editing MT to directly pull 4 or even 8 random numbers at once (i.e. a `__m256 rand_m256()` interface) is a huge performance gain.

wyrand (the top spot of the linked benchmark), doesn't have this benefit. The computation can't be SIMDified and the extraction is always a single value.

So I would take these benchmark with a grain of salt and take a closer look at the specific situation for any application where RNG speed really matters.

Wouldn't a SIMDified implementation of other RNGs speed them up by a comparable factor, making MT relatively slower again?
If the next number depends on the previous number (like with wyrand and many others) it can't be simdified.

You could simdify computation with different seeds tho, which might be fine for many purposes. However at that point it's also a custom implementation with a different number sequence than the non-simd version, which isn't the case with a SIMDified MT.

Also xoshiro generators are enough cheaper than MT that even if you couldn't simdify it, you could keep 4 and get run the 4 together in SIMD to get 4 outputs way cheaper than 1 SIMD MT.
It looks like xorshiro was SIMDified in a paper linked in abother post in this tree.
Very useful! Thanks! Aren't there AES related intrinsics for some architectures that can be really fast for rng? How do they compare?