| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by midjji 1879 days ago
	Recently noted that mt19937, mt19937_64 are much faster than the standard c++ random generator. They are also possibly better with regards to number distribution, but the performance difference is gigantic on clang. The standard 32 bit std::default_random_generator is almost 40x slower than the 64 bit mt19937_64 one across the board, from -o1 to -O3 march... etc. As all of them are also faster than old school rand, its worth upgrading for the performance increase if nothing else.

2 comments

FabHK 1879 days ago

I think both the PCG family and the xoroshiro family are faster still.

https://prng.di.unimi.it

https://www.pcg-random.org

ETA: https://github.com/lemire/testingRNG

link

ufo 1879 days ago

Does anyone know what's the current consensus about PCG vs xoroshiro. I remember that the xoroshiro author had several complaints about PCG but I don't know ifhe is right or if he was just being salty.

link

aDfbrtVt 1879 days ago

There's a bunch of analysis written by the PCG folks bashing Xoroshiro as well [1]. I think that for most normal use cases with randomized state, Xoroshiro is quite good. It only requires a single 64b add which is amazingly efficient.

[1] https://www.pcg-random.org/posts/xoshiro-repeat-flaws.html

link

rurban 1879 days ago

MT is pretty slow. There are fast variants (SFMT), but why use that when can have much better and much faster rngs?

https://rurban.github.io/dieharder/QUALITY.html

link

ascar 1879 days ago

I don't know about all the other RNGs in this benchmark, but MT prepares a dense/contiguous array of random numbers. The whole computation can be SIMDified (often even by the compiler), but extraction is (unless changed explicitly) a single copy of a double.

If RNG speed matters and you need a lot of random numbers in succession (and I assume these two assumptions correlate strongly) editing MT to directly pull 4 or even 8 random numbers at once (i.e. a `__m256 rand_m256()` interface) is a huge performance gain.

wyrand (the top spot of the linked benchmark), doesn't have this benefit. The computation can't be SIMDified and the extraction is always a single value.

So I would take these benchmark with a grain of salt and take a closer look at the specific situation for any application where RNG speed really matters.

link

jhgb 1879 days ago

Wouldn't a SIMDified implementation of other RNGs speed them up by a comparable factor, making MT relatively slower again?

link

ascar 1879 days ago

If the next number depends on the previous number (like with wyrand and many others) it can't be simdified.

You could simdify computation with different seeds tho, which might be fine for many purposes. However at that point it's also a custom implementation with a different number sequence than the non-simd version, which isn't the case with a SIMDified MT.

link

adgjlsfhk1 1879 days ago

Also xoshiro generators are enough cheaper than MT that even if you couldn't simdify it, you could keep 4 and get run the 4 together in SIMD to get 4 outputs way cheaper than 1 SIMD MT.

link

galangalalgol 1879 days ago

It looks like xorshiro was SIMDified in a paper linked in abother post in this tree.

link

galangalalgol 1879 days ago

Very useful! Thanks! Aren't there AES related intrinsics for some architectures that can be really fast for rng? How do they compare?

link

rainworld 1879 days ago

https://github.com/google/randen#performance

link