Hacker News new | ask | show | jobs
by chrisseaton 1360 days ago
I don't know where Wikipedia got this 'substantive behaviour' requirement from, but for example it explicitly isn't part of the definition in the industry's definitive reference for parallelism - Padua.

> A race condition occurs in a parallel program execution when two or more threads access a common resource, e.g., a variable in shared memory, and the order of the accesses depends on the timing, i.e., the progress of individual threads.

> The disposition for a race condition is in the parallel program. In different executions of the same program on the same input access events that constitute a race can occur in different order, which may but does not generally result in different program behaviors (non-determinacy).

Sometimes you deliberately program a full-on data race (which isn't a bug by definition, as the article says) for performance reasons.

> Data races that are not manifestations of program bugs are called benign data races.

4 comments

That definition seems to conflate "determinism" (something largely impossible to achieve in asynchronously parallel systems) with "correctness" (an abstracted property of a system that doesn't have anything to do with determinism per se). It just doesn't seem useful.

No, in overwhelmingly common usage, programmers use the term "race condition" as a category of software bug. We mean it in the correctness sense, not the one used in the linked article nor your reference. You'd be met with some very weird stares if you tried to explain how arbitrary SMP ordering of log entries or whatever was a "race condition".

> No, in overwhelmingly common usage, programmers use the term "race condition" as a category of software bug

This was exactly my point.

    race condition = incorrect behavior
    non-determinism = correct behavior
Language is tricky sometimes.
> That definition seems to conflate "determinism" (something largely impossible to achieve in asynchronously parallel systems)

There are plenty of examples of entirely deterministic parallel models. Fork-join for one.

I think you missed the point. Let me expand.

I mean, yes, what you say is true, but it just amounts to saying "synchronization is a solvable problem". At their core, ALL synchronization paradigms (spin polling, interrupt masking, OS-managed process suspend, hardware memory barriers, weird lockless tricks like Dekker's algorithm, you name it) can be understood to be ways of enforcing "para-determinism" on environments that don't provide it.

In SMP, things don't happen in reliable orders. They don't. They never will. They can't. But your hardware and OS platforms are smart and provide trickery so that at least you can guarantee that "some" things happen in reliable orders. And then you construct software such that correct behavior is dependent only on that subset of ordering and not everything.

Because there is no determinism in SMP, only that which you construct.

If you can't observe the non-determinism, then is it really non-determinism?

Your processor executes a single thread of instructions also in a non-deterministic order, based on complex internal state. We'd never say it was non-deterministic, as you can't detect it.

You can absolutely observe non-determinism. That's what a race condition bug is, after all. You can also observe benign nondeterminism every day in anything that records ordering. Go do a parallel make, then do it again and observe the ordering of the mtime values in the intermediate artifacts. It won't be the same.

And FWIW: CPU-internal instruction reordering is observable too, though the details there get complicated. x86 hides (almost!) all the complexity from you, but ARM's OOO is leakier and requires careful attention to memory barriers.

It's visible from the perspective of the software engineer who molds non-deterministic abstractions into predictable interactions for the user.
It’s worth noting that algorithms can be designed correct under nondeterministic execution. For example, quicksort is correct with a randomly selected pivot. And for associative and commutative functions it doesn’t matter what order they’re executed in the final result is always the same.

Dijkstra’s guarded commands don’t specify an order for the conditional. The semantics is that the process is free to execute any one of the cases that has a true guard. Nevertheless he found them useful for developing and describing many algorithms.

> And for associative and commutative functions it doesn’t matter what order they’re executed in the final result is always the same.

Not directed at you specifically, but just a reminder for anyone who hasn’t been burned by it yet: floating point addition is not associative. (a+b)+c != a+(b+c). It’s close, but if you’re not careful you can get bad results where the accumulated small errors turn into wrong answers.

Not just floating point maths either. The built-in "integers" in your computer don't behave like the integers taught in school either but that's how we tend to think about them. This is why I argue trapping is the most acceptable choice for arithmetic problems in general purpose languages, if the program tries to add 150 to 150 in an unsigned 8-bit integer, that's a programming error. By all means offer wrapping arithmetic, sometimes it's what the programmer actually wanted and they benefit from being able to clarify that - but most often they didn't want that and would be astonished that 150 + 150 = 44.
It can overflow, but integer math is associative and commutative still, isn’t? It’s just modulo the biggest representable number (if unsigned).
> It’s worth noting that algorithms can be designed correct under nondeterministic execution.

In fact the whole concept of "symmetric multiprocessing" demands it.

By this definition, every access to a shared resource is a race condition, e.g. even when properly acquiring a lock. It is common knowledge that you introduce locks to remove race conditions so I would say something is definitely missing from the definition.
I think "remove" the race condition is the wrong word. It should be "resolve" the race conditions.

The race exists because there are multiple accesses. This is resolved when there is a protocol for deciding who proceeds and who waits for the other.

Usually you’re adding locks not to remove the race condition but to make them not a bug.
That is a valid definition but it's not the one used by everyone I know and work with. A race condition is a fault, just like a SQL injection; it could easily have referred to a programming technique but it doesn't.
Hey, I've just downloaded PADUA (http://dx.doi.org/10.1145/2633685), and skimming through it, I can't find a single mention of the phrase "race condition".

Is this the paper you're referring to? If not, could you please provide a reference to which PADUA you're referring to? I'd really like to read more on the subject, especially if the source is, as you claim, an industry reference.

Thanks!

If you have any relevant information, would you care to elaborate more on why is it considered industry standard and how did it gain such status?

This is the first time I'm hearing of both the author and the book (which probably says more about me), and it seems kind of odd that the definition differs from what is usually taught in class (at least mine). Of course, it wouldn't be the first time that popular use of some word differs from its original intended meaning, but I'm just interested in the wider historical context.

Thanks again.

I don't know what else to do but appeal to authority - it's the definition of a word after all?

https://cs.illinois.edu/about/people/faculty/padua

> David Padua has served as program committee member, program chair, or general chair to more than 70 conferences and workshops. He was the Editor-in-Chief of Springer‐Verlag’s Encyclopedia of Parallel Computing and is currently a member of the editorial board of the Communications of the ACM, the Journal of Parallel and Distributed Computing, and the International Journal of Parallel Programming.