Hacker News new | ask | show | jobs
by withoutboats3 384 days ago
The occurrence of data races depends on the specific non-deterministic sequence of execution of concurrent codepaths. Just because you have 100% code coverage does not mean you've covered every potential execution sequence, and its almost never practical to actually execute every possibility to ensure the absence of data races. Depending on the probability that your data race will occur, it could indeed be something you have to make stars align for TSAN to catch.

Not to talk my own book, but there is a well-known alternative to C++ that can actually guarantee the absence of data races.

1 comments

It "could" for some algorithms, yes, but for a lot of algorithms, that kind of star alignment simply isn't necessary to find all the data races, was my point. And yes, TLA+ etc. can be helpful, but then you have the problem of matching them up with the code.
I feel like in a subtle way you're mixing up data races with race conditions, especially given the example you site about incrementing an atomic variable.

TSAN does not check for race conditions in general, and doesn't claim to do so at all as the documentation doesn't include the term race condition anywhere. TSAN is strictly for checking data races and deadlocks.

Consequently this claim is false:

>The issue is that even if it statically proved the absence of data races in the C++ sense, that still wouldn't imply that your algorithm is race-free.

Race-free code means absence of data races, it does not mean absence of the more general race condition. If you search Google Scholar for race free programming you'll find no one uses the term race-free to refer to complete absence of race conditions but rather to the absence of data races.

There's "data race" in "C++ ISO standard" sense, and then there's "data race" in the general CS literature (as well as all the other terms). Two threads writing a value to the same memory location (even atomically) is usually a data race in the CS/algorithm sense (due to the lack of synchronization), but not the C++ sense. I'm not really interested in pedantic terminology here, just trying get a higher level point across about what you can & can't assume with a clean TSAN (and how not to clean your TSAN errors). Feel free to mentally rewrite my comment with whatever preferred terminology you feel would get my points across.
This isn't pedantry, if you're going to talk about how specific tools work then you need to use the actual terminology that the tools themselves use or else you will confuse yourself and anyone you talk to about them. If we were discussing general concepts about thread safety then sure we can be loose about our words, but if we're talking about a specific tool used for a specific programming language then we should make sure we are using the correct terminology, if only to signal that we have the proper domain knowledge to speak about the subject.

>Feel free to mentally rewrite my comment with whatever preferred terminology you feel would get my points across.

If I rewrite your comment to use data race, then your comment is plainly incorrect since the supporting example you give is not a data race but a race condition.

If I rewrite your comment to use race condition, then your comment is also incorrect since TSAN doesn't detect race conditions in general and doesn't claim to, it detects data races.

So what exactly am I supposed to do in order to make sense of your post?

The idea that you'd talk about the pros and cons of a tool like TSAN without knowing the difference between a race condition and a data race is kind of absurd. That you'd claim my clarification of these terms for the sake of better understanding your point is a form of pedantry is sheer hubris.

Hold on, before attacking me. Say we have this Java program, and assume the semantics of the common JRE/JVM everyone uses. Do you believe it has a data race or not? Because the variable is accessed atomically, whether whether you mark it as volatile or not:

  class Main {
    private static String s = "uninitialized";
    public static void main(String[] args) {
      Thread t = new Thread() {
        public void run() { s = args[0]; }
      };
      t.start();
      System.out.println(s);
    }
  }
And I sure as heck have not heard anyone claim such data races are impossible in Java. (Have you?)
>When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race.

Yes, your program contains a data race, by the definition used in the JLS. The set of outcomes you may observe from a data race are specified. I'm not sure if this choice was intentional or not, but there is a guarantee that you will either print the argument or "uninitialized" and no other behavior, because String relies on final field semantics. This is would not be true in c/c++ where the equivalent code is undefined behavior and you could see any result.

In Java you can have a data race and use it productively for certain niche cases, like String.hashcode - I've also contributed some to the Guava concurrency library. This is not true in c/c++ where data races (by their definition) are undefined behavior. If you want to do the tricks you can in racy Java without UB, you have to declare your variables atomic and use relaxed memory order and possibly fences.

What you wrote is indeed a data race, it'll race s, but you mention the semantics of the JRE and I wonder if you actually know what those are because that's crucial here.

You see Java has a specific memory ordering model (many languages just give you a big shrug, including C before it adopted the C++ 11 model but Java spells out what happens) and that model is very sophisticated so it has an answer to what happens here.

Because we raced s, we lose Sequential Consistency. This means in general (this example is so trivial it won't matter) humans struggle to understand what's going on in their program, which makes debugging and other software engineering impractical. But, unlike C++ loss of Sequential Consistency isn't fatal in Java, instead we're promised that when s is observed in the main thread it will either be that initial "uninitialized" string or it will have the args[0] value, ie the name of the program because these are the only two values it could have and Java does not specify which of them observed in this case.

You could think of this as "atomic access" and that's likely the actual implementation in this case, but the Java specification only promises what I wrote.

In C++ this is game over, the language standard specifically says it is Undefined Behaviour to have any data race and so the behaviour of your program is outside the standard - anything at all might happen.

[Edited: I neglected originally to observe that s is set to "uninitialized", and instead I assumed it begins as null]

The Java specification which can be found here [1] makes clear that with respect to its memory model the following is true:

    1. Per 17.4.5 your example can lead to a data race.

       "When a program contains two conflicting accesses (§17.4.1) that are not ordered by a happens-before relationship, it is said to contain a data race."

    2. Per 17.7 the variable s is accessed atomically.

       "Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values."
However, atomic reads and writes are not sufficient to protect against data races. What atomic reads and writes will protect against is word tearing (outlined in 17.6 where two threads simultaneously write to overlapping parts of the same object with the result being bits from both writes mixed together in memory). However, a data race involving atomic objects can still result in future reads from that object to result in inconsistent values, and this can last indefinitely into the future. This does not mean that reading from a reference will produce a garbage value, but it does mean that two different threads reading from the same reference may end up reading two entirely different objects. So, you can have thread A in an infinite loop repeatedly reading the value "uninitialized" and thread B in another infinite loop repeatedly reading the value args[0]. This can happen because both threads have their own local copy of the reference which will never be updated to reflect a consistent shared state.

As per 17.4.3, a data-race free program will not have this kind of behavior where two threads are in a perpetually inconsistent state, as the spec says "If a program has no data races, then all executions of the program will appear to be sequentially consistent."

So while atomicity protects against certain types of data corruption, it does not protect against data races.

[1] https://docs.oracle.com/javase/specs/jls/se24/html/jls-17.ht...

> Two threads writing a value to the same memory location (even atomically) is usually a data race in the CS/algorithm sense (due to the lack of synchronization), but not the C++ sense

You seem to conflate the concepts of "data race" and "race condition", which are not the same thing.

Two threads writing to the same memory location without synchronization (without using atomic operations, without going thru a synchronization point like a mutex, etc.) is a data race, and almost certainly also a race condition. If access to that memory location is synchronized, whether thru atomics or otherwise, then there's no data race, but there can still be a race condition.

This isn't a pedantic distinction, it's actually pretty important.

> Two threads writing a value to the same memory location (even atomically) is usually a data race in the CS/algorithm sense (due to the lack of synchronization), but not the C++ sense

Not only are you incorrect, it’s even worse than you might think. Unsynchronized access to data in c++ is not only a data race, it’s explicitly undefined behavior and the compiler can choose to do whatever in response of an observed data race (which you are promising it isn’t possible by using the language).

You are also misinformed about the efficacy of TSAN. Even in TSAN you have to run it in a loop - if TSAN never observes the specific execution order in a race it’ll remain silent. This isn’t a theoretical problem but a very real one you must deeply understand if you rely on these tools. I recall a bug in libc++ with condition_variable and reproducing it required running the repro case in a tight loop like a hundred times to get even one report. And when you fixed it, how long would you run to have confidence it was actually fixed?

And yes, race conditions are an even broader class of problems that no tool other than formal verification or DST can help with. Hypothesis testing can help mildly but really you want at least DST to probabilistically search the space to find the race conditions (and DST’s main weakness aside from the challenge of searching a combinatorial explosion of states is that it still relies on you to provide test coverage and expectations in the first place that the race condition might violate)

TSAN observes the lack of an explicit order and warns about that, so it is better in some sense than just running normally in a loop and hoping to see the occurrence of a specific mis-ordering. But that part of it is a data race detector, so it cannot do anything for race conditions, and as soon as something is annotated as atomic, it cannot do anything to detect misuse. It can be better for lock evaluation, as it can check they are always acquired in the same order without needing to actually observe a conflicting deadlock occurring. But I agree you need more formal tooling to actually show the problem is eliminated and not just improbable
Geez. I'm well aware it's UB. That was just sloppy wording. Should've said "not necessarily", okay. I only wrote "not in the C++ sense" because I had said "even atomically", not because I'm clueless.
The alternative to C++ that I meant was Rust, which statically prevents data races.
...so long as your code doesn't use unsafe, neither directly nor transitively.