Hacker News new | ask | show | jobs
by rocqua 3070 days ago
Yes, if your data races can pollute pointers, that makes subsequent access of those pointers undefined behavior. But is the actual data race undefined behavior?

Take the following example from the article:

Thread A loads the value 0 Thread B loads the value 0 Thread A increments the value, calculating a result of 1 Thread B also calculates a result of 1 Thread A stores the value 1 Thread B stores the value 1

This is definitely indeterminate behavior, for the final value could also be 2. But if it is undefined behavior, the standard would also allow that final value to be 3 or 999 or -1 or INTMAX, it would allow anything.

To see a case where this matters, suppose you want an approximate total and you do this by concurrent addition to some shared variable. You expect overlapping additions to be so infrequent as to not have a large effect on the final value. Thus, you decide to not make that value atomic and accept the inaccuracy since it is an approximation anyway. Maybe you do this because atomic locks an entire cache-line, and something else in the cache-line is frequently accessed due to struct outline.

If the above scenario is actually 'undefined behavior' then this would be a really bad idea. You'd be at the mercy of the compiler not wanting to use this for any optimization.

1 comments

It's only undefined behavior if you use regular variables and regular increments. If you use the standard atomic functions/classes with std::memory_order:relaxed, it's legal, and provides what you want.
std::memory_order:relaxed would not cover my use case. This still requires locking, if we are using atomic<int> on x86 the locking is probably done at an assembly level which is much faster, but it remains locking.

Specifically, that form of locking marks a specific cache line 'locked' and blocks all processors from modifying that cache line. When you're variable happens to share that cache line with something you want quick access to, this stuf matters. (there is an exception using 'compare and swap' instructions, but I've only seen ICC emit those, not GCC.)

In general, I would expect data races to only create UB when the behavior they result in happens to be UB. In my specific example, there is no source of UB except for apparently the data race.

If you really want that, just do a relaxed read into a local variable, increment there and relaxed store afterwards. At least clang will give you exactly the assembly you want, minus the technically undefined behavior: https://godbolt.org/g/9ikt1m
memory_order::relaxed doesn't imply locking. Compilers on platforms where aligned access is atomic will generally implement relaxed reads and writes without locking, barriers or other expensive operations. So x = x + 1 with relaxed semantics should be cheap and best effort, like you want.