Hacker News new | ask | show | jobs
by TillE 683 days ago
> I didn't know std::uniform_int_distribution doesn't actually produce the same results on different compilers

I think this is genuinely my biggest complaint about the C++ standard library. There are countless scenarios where you want deterministic random numbers (for testing if nothing else), so std's distributions are unusable. Fortunately you can just plug in Boost's implementation.

3 comments

It's actually really important that uniform_int_distribution is implementation defined. The 'right' way to do it on one architecture is probably not the right way to do it on a different architecture.

For instance, Apple's new CPUs has very fast division. A convenient and useful tool to implement uniform_int_distribution relies on using modulo. So the implementation that runs on Apple's new CPUs ought to use the modulo instructions of the CPU.

On other architectures, the ISA might not even have a modulo instruction. In this case, it's very important that you don't try to emulate modulo in software; it's much better to rely other more complicated constructs to give a uniform distribution.

C++ is also expected to run on GPUs. NVIDIA's CUDA and AMD's HIP are both implementations of C++. (these implementations are non-compliant given the nature of GPUs, but both they and the C++ standard's committee have a shared goal of narrowing that gap) In general, std::uniform_int_distribution uses loops to eliminate redundancies; the 'happy path' has relatively easily predicted branches, but they can and do have instances where the branch is not easily predicted and will as often as not have to loop in order to complete. Doing this on a GPU might be multiple orders of magnitude slower than another method that's better suited for a GPU.

Overzealously dictating an implementation is why C++ ended up with a relatively bad hash table and very bad regex in the standard. It's a mistake that shouldn't be made again.

But reproducibility is as important as performance for the vast majority of use cases, if these implementation-defined bits start to affect the observable outcomes. (That's why we define the required time complexity for many container-related functions but do not actually specify the exact algorithm; difference in Big-O time complexity is just large enough to be "observed".)

A common solution is to provide two versions of such features, one for the less reproducible but maximally performant version and another for common middle grounds that can be reproduced reasonably efficiently across many common platforms. In fact I believe `std::chrono` was designed in that way to sidestep many uncertainties in platform clock implementations.

> Overzealously dictating an implementation is why C++ ended up with a relatively bad hash table and very bad regex in the standard.

What parts of the standard dictate a particular regex implementation? IIRC the performance issues are usually blamed on ABI compatibility constraints rather than the standard making a fast(er) implementation impossible.

Nobody is using standard library for high-performant random number implementations.
> I think this is genuinely my biggest complaint about the C++ standard library

What do you think of Abseil hash tables randomizing themselves (piggybacking on ASLR) on each start of your program?

Their justification is here https://github.com/abseil/abseil-cpp/issues/720

However, I personally disagree with them since I think it's really important to have _some_ basic reproducibility for things like reproducing the results of a randomized test. In that case, I'm going to avoid changing as much as possible anyways.

> There are countless scenarios where you want deterministic random numbers (for testing if nothing else), so std's distributions are unusable. Fortunately you can just plug in Boost's implementation.

I don't understand what's your complain. If you're already plugging in alternative implementations,what stops you from actually stubbing these random number generators with any realization at all?

It's a compromised and goofy implementation with lots of warts. What's the point it in having a /standard/ library then?
> It's a compromised and goofy implementation with lots of warts.

I don't think this case qualifies as an example. I think the only goofy detail in the story is expecting a random number generator to be non-random and deterministic with the only conceivable usecase being poorly designed and implemented test fxtures.

> What's the point it in having a /standard/ library then?

The point of standardized components is to provide reusable elements that can be used across all platforms and implementations, thus saving on the development effort of upgrading and porting the code across implementations and even platforms. If you cannot design working software, that's not a problem you can pin on the tools you don't know how to use.

> The point of standardized components is to provide reusable elements that can be used across all platforms and implementations, thus saving on the development effort of upgrading and porting the code across implementations and even platforms.

It's a shame that C++'s "standardized" components ARE COMPLETELY DIFFERENT on different platforms.

Some of the C++ standard requires per-platform implementation work. For example std::thread on Linux and Windows obviously must have a different implementation. However a super majority of the standard API is just vanilla C++ code. For example std::vector or std::unordered_map. The fact that the standard defines a spec which is then implemented numerous times is absurd, stupid, and bad. The specs are simultaneously over-constrained and under-constrained. It's a disaster.

I consider the current tradeoff to be a feature.

It permits implementations to take advantage of target-specific affordances (your thread case is an example) as well as taking different implementation strategies (e.g. the small string optimization is different in libc++ and libstdc++). Also you may use another, independent standard library because you prefer its implementation decisions. Meanwhile they remain compatible at the source level.

Unlike in C, in C++ it is not possible to use an independent implementation of the standard library.

Clang is compatible with GCC's standard library/libstdc++ and MSVC's standard library because the clang compiler explicitly supports them, but it's not possible to use clang's standard library with GCC in a standard conforming way or interchange GCC's with MSVC's standard library.

There are some hacks that let you use some parts of libc++ with GCC by using the nostdlib flag, but this disables a lot of C++ functionality such as exception handling, RTTI, type traits. These features are in turn used by things like std::vector, std::map, etc... so you won't be able to use those classes either, and so on so forth...

> as well as taking different implementation strategies (e.g. the small string optimization is different in libc++ and libstdc++

As a user this is really not a good thing when the stdlib is tied to the platform. In the end the only sane thing to do if you want any reproducibility across operating systems is to exclusively use libc++ everywhere.

Hard, hard disagree.

If you want to support different implementation strategies it needs to be far more piecemeal. Not all or nothing. I mean there's only 3 meaningful implementation - libstdc++, libc++, and MSVC. And they aren't wholly interchangeable!

Quite frankly if you value trying different implementation strategies then the C++ model is a complete and total failure. A successful model would have many, many different implementations of different components. The fact there are just 3 is an objective failure.

std::deque, a container with some quite useful theoretical properties, is completely unusable because the node size is not specifiable by the user and MSVC chose 16 bytes (I think, insanely small nonetheless).
> with the only conceivable usecase being poorly designed and implemented test fxtures.

Reproducible pseudo-randomness is a necessity with fuzz testing. It is not a poor design approach when it is actually useful.

it is reproducible within a single standard library implementation, so usable for fuzz testing