| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rurban 1878 days ago
	MT is pretty slow. There are fast variants (SFMT), but why use that when can have much better and much faster rngs? https://rurban.github.io/dieharder/QUALITY.html

2 comments

ascar 1878 days ago

I don't know about all the other RNGs in this benchmark, but MT prepares a dense/contiguous array of random numbers. The whole computation can be SIMDified (often even by the compiler), but extraction is (unless changed explicitly) a single copy of a double.

If RNG speed matters and you need a lot of random numbers in succession (and I assume these two assumptions correlate strongly) editing MT to directly pull 4 or even 8 random numbers at once (i.e. a `__m256 rand_m256()` interface) is a huge performance gain.

wyrand (the top spot of the linked benchmark), doesn't have this benefit. The computation can't be SIMDified and the extraction is always a single value.

So I would take these benchmark with a grain of salt and take a closer look at the specific situation for any application where RNG speed really matters.

link

jhgb 1878 days ago

Wouldn't a SIMDified implementation of other RNGs speed them up by a comparable factor, making MT relatively slower again?

link

ascar 1878 days ago

If the next number depends on the previous number (like with wyrand and many others) it can't be simdified.

You could simdify computation with different seeds tho, which might be fine for many purposes. However at that point it's also a custom implementation with a different number sequence than the non-simd version, which isn't the case with a SIMDified MT.

link

adgjlsfhk1 1878 days ago

Also xoshiro generators are enough cheaper than MT that even if you couldn't simdify it, you could keep 4 and get run the 4 together in SIMD to get 4 outputs way cheaper than 1 SIMD MT.

link

galangalalgol 1878 days ago

It looks like xorshiro was SIMDified in a paper linked in abother post in this tree.

link

galangalalgol 1878 days ago

Very useful! Thanks! Aren't there AES related intrinsics for some architectures that can be really fast for rng? How do they compare?

link

rainworld 1878 days ago

https://github.com/google/randen#performance

link