Hacker News new | ask | show | jobs
by ascar 1878 days ago
I don't know about all the other RNGs in this benchmark, but MT prepares a dense/contiguous array of random numbers. The whole computation can be SIMDified (often even by the compiler), but extraction is (unless changed explicitly) a single copy of a double.

If RNG speed matters and you need a lot of random numbers in succession (and I assume these two assumptions correlate strongly) editing MT to directly pull 4 or even 8 random numbers at once (i.e. a `__m256 rand_m256()` interface) is a huge performance gain.

wyrand (the top spot of the linked benchmark), doesn't have this benefit. The computation can't be SIMDified and the extraction is always a single value.

So I would take these benchmark with a grain of salt and take a closer look at the specific situation for any application where RNG speed really matters.

1 comments

Wouldn't a SIMDified implementation of other RNGs speed them up by a comparable factor, making MT relatively slower again?
If the next number depends on the previous number (like with wyrand and many others) it can't be simdified.

You could simdify computation with different seeds tho, which might be fine for many purposes. However at that point it's also a custom implementation with a different number sequence than the non-simd version, which isn't the case with a SIMDified MT.

Also xoshiro generators are enough cheaper than MT that even if you couldn't simdify it, you could keep 4 and get run the 4 together in SIMD to get 4 outputs way cheaper than 1 SIMD MT.
It looks like xorshiro was SIMDified in a paper linked in abother post in this tree.