Hacker News new | ask | show | jobs
by rwbhn 1584 days ago
Your statement about your sample code is incorrect - just confirmed it for myself (cpython 3.8.3 is what was handy). Just add those print statements to the example in the article (before/after labels make analysis easier) and also add a set of threads that just do 'x = 0'.
1 comments

If you have a thread decrease the value of x then of course it's possible for x to decrease in value. The point is if you have a bunch of threads where every thread only ever increases the value of x like in the article then in Python it will never be the case that the value of x decreases.

In languages were data races are possible you can have every thread only ever increment x, and yet x's value will decrease due to a data race.

Python's GIL will protect against data races, meaning that certain classes of bugs where a specific memory location takes on a completely arbitrary value will never be observed from a Python program.

As reitzensteinm points out, even if every thread does "x = x + 1", x can decrease.

x starts as 0.

Thread A evaluates x + 1, getting 1.

Thread B executes the full statement 100 times, setting x to 100.

Thread A finishes executing the statement, setting x to 1.

So x decreased from 100 to 1.

But I agree with you that this is a race condition, not a data race.

> In languages were data races are possible you can have every thread only ever increment x, and yet x's value will decrease due to a data race.

That would be a very strange language, or a very strange machine.

In particular integer reads and writes are atomic (though not necessarily well ordered) on x86, arm, etc, so you'd be hard pressed to get a C compiler to emit code that tears the reads or writes in a way that would lead to integer decrements.

Your statement is too strong and comes with important caveats. Only certain aligned memory accesses to non-floating point data types are atomic. Floating point values, unaligned accesses and even integer operations via SIMD instructions are not atomic on the platforms you list. You can absolutely produce an unaligned pointer in C, or use SIMD to increment an integer value (in conjunction with other values) in which case there is no guarantee of atomicity and hence the potential for data to be clobbered if it's not protected by a synchronization primitive.
But (other than floating point, which is surprising, if true for word and smaller values on machines that actually have floating point units), all of those examples are either non-portable assembly code (SIMD) or directly violate the C memory model (unaligned access).

You can't dereference a non-aligned pointer in portable C. It will bus error on certain architectures (including some arm variants).

(I've written plenty of code that relies on unaligned reads, and also that relies on non-torn reads/writes, just not at the same time, and never when portability was a concern.)

We're jumping all over the place I'm afraid. If you're talking about writing standard/portable C then data races are undefined behavior and hence there is no portable manner in which a data race can be observed. A C compiler is free to produce any observable behavior whatsoever in the presence of a data race.

If you wish to discuss x86 or ARM, well a data race can occur in a C program through the use of SIMD instructions or writing through an unaligned pointer. If you want to pick an architecture that does not allow unaligned accesses, sure we can discuss the PowerPC 500 series where unaligned accesses are a bus error, but then reads and writes of 32 bit values are not atomic and hence can produce data races.

We can't mix properties of one architecture with properties of another architecture and also discuss portable C. Any consideration of data races or undefined behavior in general must be specific to a particular architecture and we must apply the rules of any given architecture consistently.