Hacker News new | ask | show | jobs
by mananaysiempre 25 days ago
And it makes sense as long as you allow the concept of unsequenced operations at all (admittedly it’s somewhat rare; e.g. in Scheme such things are defined to still occur in sequence, but which specific sequence is unspecified and potentially different each time). The “volatile” annotation marks your variable as being an MMIO register or something of that nature, something that could change at any point for reasons outside of the compiler’s control. Naturally, this means all of the hazards of concurrent modification are potentially there.

That said, your “common parlance” definition of “data race” is not the definition used by the C standard, so your last sentence is at best misleading in a discussion of standard C.

> The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

(Here “conflicting” and “happens before” are defined in the preceding text.)

1 comments

Your first paragraph makes it sound as if the compiler will actually generate two reads of the value of some register, which might lead to unexpected effects at runtime for certain special registers.

However, this is not at all what UB means in C (or C++). The compiler is free to optimize away the entire block of code where this printf() sequence occurs, by the logic that it would be UB if the program were to ever reach it.

For example, the following program:

  int y = rand();
  if (y != 8) {
    volatile int x;
    printf("%d: %d", x, x) ;
  } else {
    printf("y is 8");
  }
Can be optimized to always print "y is 8" by a perfectly standard compliant compiler.
> Your first paragraph makes it sound as if the compiler will actually generate two reads of the value of some register, which might lead to unexpected effects at runtime for certain special registers.

I don’t see how. I was trying to explain why it’s reasonable for a volatile read to be a side effect, after which the C rule on unsequenced side effects applies, yielding UB as you say.

"volatile" tells the compiler it is _not_ safe to optimise away any read or write, so it can't just optimise that section away at all.

> An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.

A compliant compiler is only free to optimise away, where it can determine there are no side-effects. But volatile in 5.1.2.3 has:

> Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects.

Yes, but undefined behaviour is undefined behaviour, and that behaviour can legally be that the code is not emitted at all, volatile (or any other side effect) or not. (and compilers do reason about undefined behaviour when optimising, so this isn't necessarily a completely theoretical argument, though I don't know whether the in compiler's actual logic which of 'don't optimise volatile' or the 'do assume undefined behaviour is impossible and remove code that definitely invokes it' would 'win', or whether there's any current compiler that would flag this as unconditionally undefined behaviour in the first place).
Volatile wins.

GCC calls that out [0] - volatile means things in memory may not be what they appear to be, and that there are asynchronous things happening, so something that may not appear to be possible, may become so, because volatile is a side-effect.

So about the only optimisation allowed to happen, is combining multiple references.

Clang is similar:

> The compiler does not optimize out any accesses to variables declared volatile. The number of volatile reads and writes will be exactly as they appear in the C/C++ code, no more and no less and in the same order.

[0] https://www.gnu.org/software/c-intro-and-ref/manual/html_nod...

That's cool and all if you are writing GCC or Clang dialect C, but it doesn't change the fact that it is UB in the C standard.
This is all assuming that the code is not invoking undefined behaviour. If the code is invoking undefined behaviour, GCC and clang are both well within their rights to say 'none of the rest of our documentation applies' (and have historically done so on bug reports).
Sure it can. That code path has unconditional UB and thus it is not valid.
Only if there would be no side-effects. Which there are.
No this is irrelevant for making this decision
I've mentioned elsewhere the standards, and compilers as well, disagreeing with you here.

But feel free to run against the various compilers through godbolt. [0] They won't optimise the branch away. Access to a volatile, must be preserved, in the order that they exist. No optimisation, UB or otherwise, is allowed to impede that. Because an access is a side-effect.

[0] https://godbolt.org/z/85cGhq3Ta

This looks like a long back and fourth, that can easily be solved by a minute or two on godbolt...
> that can easily be solved by a minute or two on godbolt...

Unfortunately it's not that simple when it comes to UB. If the snippet in question does in fact exhibit UB then there's no guarantee whatever Godbolt shows will generalize to other programs/versions/compilers/environments/etc.

No, compilers will often choose to not optimize on UB.
When compiler decides something is UB aka "result of this code is not defined and could be any" it selects the most performant version of undefined behavior - doing nothing by optimizing code away.
The compiler is not free to remove accesses to something marked volatile - its defined as a side-effect.

Volatile means something else may be acting here. Something else may install anything into the register at any time - and every time you access.

The compiler is required to preserve the order of accesses. In almost every C compiler, today, there are almost no optimisations the moment a volatile is introduced, for this reason.

If code has undefined behavior, the entire execution path that leads to that UB has no assigned semantics in the C model. So there are no volatile accesses in this code according to the C abstract machine - the entire execution path is UB, so it can be assumed it doesn't happen at all.
> An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine

The execution path has unknown side effects, and so the execution path must be strictly followed. That's uh... The entire point of that section in the C standard. Its why volatile is called out, in the semantic model for the abstract machine.

Otherwise... Why call it out, at all? It must be strictly followed, not lazily, as in other areas of the standard.

The print example has no defined order of accesses, function parameters can be evaluated in any order. But further, the entire problem with UB is that it supercedes the regular guarantees that you get (like with volatile) when it's encountered. Yes gcc and clang do the obvious thing that makes the most sense in this example, but what people are trying to tell you is that they could just not do that and they would still be complying with the standard. For example, you can imagine a more serious example of UB that causes the program to fail to compile completely, and then do you emit the correct number of in order reads of volatile variables? Obviously not.
Function parameters cannot be evaluated in any order, when one of them is a volatile.

> The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject

And what I am trying to tell people, is the standard has expectations around the volatile keyword, that the compilers took into account when designing how they would work - it isn't just kindness, its compliance. But no one is actually talking about the quotes from the standard, and just quoting themselves and their own understandings.