Hacker News new | ask | show | jobs
by aorav 3070 days ago
The low-level nature of C++ (and also C) makes data races unavoidably undefined behavior. Imagine a data race that causes a torn write on a pointer. This pointer could now point pretty much anywhere. The function stack, some vtable, program data, etc. Any write to that dereferenced pointer will alter the state of your program in such a way that the behavior of your program will become undefined. There would be no way for C++ to guarantee any kind of defined behavior when you can write to arbitrary locations in memory.
2 comments

There are two different things that people can mean when they say "undefined behaviour" in regards to C/C++. One is what you're talking about, with "undefined behaviour" meaning "one of several things could happen, you don't know which, and some of them may be bad."

The other is more specifically talking about code that is deemed "undefined behaviour" by the spec. There's a surprising amount of this, and the compiler is explicitly allowed to do anything it wants when encountered, from emitting 'close enough' code to emitting no code at all, to emitting code to order a pizza to your current location.

If this is the latter case then it could result in random bits of code mysteriously doing completely unexpected things if the compiler thinks a data race there is possible. This would make it far harder to track down bugs and reason about what code might cause such.

Yes, if your data races can pollute pointers, that makes subsequent access of those pointers undefined behavior. But is the actual data race undefined behavior?

Take the following example from the article:

Thread A loads the value 0 Thread B loads the value 0 Thread A increments the value, calculating a result of 1 Thread B also calculates a result of 1 Thread A stores the value 1 Thread B stores the value 1

This is definitely indeterminate behavior, for the final value could also be 2. But if it is undefined behavior, the standard would also allow that final value to be 3 or 999 or -1 or INTMAX, it would allow anything.

To see a case where this matters, suppose you want an approximate total and you do this by concurrent addition to some shared variable. You expect overlapping additions to be so infrequent as to not have a large effect on the final value. Thus, you decide to not make that value atomic and accept the inaccuracy since it is an approximation anyway. Maybe you do this because atomic locks an entire cache-line, and something else in the cache-line is frequently accessed due to struct outline.

If the above scenario is actually 'undefined behavior' then this would be a really bad idea. You'd be at the mercy of the compiler not wanting to use this for any optimization.

It's only undefined behavior if you use regular variables and regular increments. If you use the standard atomic functions/classes with std::memory_order:relaxed, it's legal, and provides what you want.
std::memory_order:relaxed would not cover my use case. This still requires locking, if we are using atomic<int> on x86 the locking is probably done at an assembly level which is much faster, but it remains locking.

Specifically, that form of locking marks a specific cache line 'locked' and blocks all processors from modifying that cache line. When you're variable happens to share that cache line with something you want quick access to, this stuf matters. (there is an exception using 'compare and swap' instructions, but I've only seen ICC emit those, not GCC.)

In general, I would expect data races to only create UB when the behavior they result in happens to be UB. In my specific example, there is no source of UB except for apparently the data race.

If you really want that, just do a relaxed read into a local variable, increment there and relaxed store afterwards. At least clang will give you exactly the assembly you want, minus the technically undefined behavior: https://godbolt.org/g/9ikt1m
memory_order::relaxed doesn't imply locking. Compilers on platforms where aligned access is atomic will generally implement relaxed reads and writes without locking, barriers or other expensive operations. So x = x + 1 with relaxed semantics should be cheap and best effort, like you want.