|
|
|
|
|
by nwallin
688 days ago
|
|
It's actually really important that uniform_int_distribution is implementation defined. The 'right' way to do it on one architecture is probably not the right way to do it on a different architecture. For instance, Apple's new CPUs has very fast division. A convenient and useful tool to implement uniform_int_distribution relies on using modulo. So the implementation that runs on Apple's new CPUs ought to use the modulo instructions of the CPU. On other architectures, the ISA might not even have a modulo instruction. In this case, it's very important that you don't try to emulate modulo in software; it's much better to rely other more complicated constructs to give a uniform distribution. C++ is also expected to run on GPUs. NVIDIA's CUDA and AMD's HIP are both implementations of C++. (these implementations are non-compliant given the nature of GPUs, but both they and the C++ standard's committee have a shared goal of narrowing that gap) In general, std::uniform_int_distribution uses loops to eliminate redundancies; the 'happy path' has relatively easily predicted branches, but they can and do have instances where the branch is not easily predicted and will as often as not have to loop in order to complete. Doing this on a GPU might be multiple orders of magnitude slower than another method that's better suited for a GPU. Overzealously dictating an implementation is why C++ ended up with a relatively bad hash table and very bad regex in the standard. It's a mistake that shouldn't be made again. |
|
A common solution is to provide two versions of such features, one for the less reproducible but maximally performant version and another for common middle grounds that can be reproduced reasonably efficiently across many common platforms. In fact I believe `std::chrono` was designed in that way to sidestep many uncertainties in platform clock implementations.