| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kloch 1699 days ago

> "volatile" does not mean what you think it means; if you're using it for anything other than interacting with hardware registers in a device driver, you're almost certainly using it incorrectly.

Another "correct" use of volatile is a hack to prevent compilers from optimizing away certain code. It's pretty rare to need that and often you can just use a lower optimization level (like the usual -O2) but sometimes you need -O3 / -Ofast or something and a strategic volatile type def to keep everything working.

A classic example is Kahan summation algorithim. At -O2 it's fine. At -O3 or higher it silently defeats the algorithm while appearing to work (you get a sum but without the error compensation). Defining the working vars as volatile makes it work again. This is noted in the wikipedia pseudocode with the comment "// Algebraically, c should always be zero. Beware overly-aggressive optimizing compilers!"

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

Of course -O3 might not be any faster anyway but that's another topic.

1 comments

vlovich123 1699 days ago

I can’t imagine it’s an O2 vs O3 thing unless a compiler enables “fast-math” optimization to allow associativity. Neither clang nor GCC do this (neither does MSVC I think) - optimization levels never silently turn off IEEE754 floating point. I don’t know about ICC but it sounds like they stupidly enable fast math by default to try to win at benchmarks.

Do you have anything to actually support this statement or did you just assume “overly aggressive optimizing compilers” and “O3” are somehow linked?

Generally optimization levels may find more opportunities to exploit UB, but they do not change the semantics of the language, and all languages I’m familiar with define floating point as a non-associative operation because it’s not when you’re working with finite precision.

TLDR: Don’t use volatile unless you really know what you’re doing, and unless you know C/C++ really well, you probably do not. If anyone tells you to throw in a volatile to “make things work”, it’s most likely cargo curling bad advice (not always, but probably).

link

gpderetta 1699 days ago

there is some amount of truth on what the parent is saying. Ages ago, when x86 only had x87 FP, gcc would program the FPU to use 80 bit precision even when dealing with doubles. The excess precision meant that GCC could not implement IEEE math correctly even without fast-math. Forcing the storing of intermediate values into memory via volatile variables was a partial solution to this problem.

MSVC configures the FPU to use 64 bit precision which means that double words fine, but it has no 80 bit long double and float still suffer from excess precision.

SSE avoid all these problems.

link

vlovich123 1699 days ago

Kind of, but that still shouldn't have impacted Kahan summation, which only cares about associativity, and extended precision doesn't impact that. They would just end up getting more numerically accurate results on x87.

link

kloch 1699 days ago

I did tests on Kahan summation recently on my macbook pro and -O3 defeated the algorithm while -O2 did not. Declaring the below variables as volatile restored error compensation with -O3.

The relevant code is:

          kahan_y=g_sample_z - kahan_c;
          kahan_t=g_sample_z_sum + kahan_y;
          kahan_c=(kahan_t - g_sample_z_sum) - kahan_y;
          g_sample_z_sum=kahan_t;

(this is in an inner loop where a new g_sample_z is calculated and then added to a running g_sample_z_sum with this snippet)

link

vlovich123 1699 days ago

Sounds like a compiler bug to me. Can you file a bug to clang with a reduced standalone test (or I can do it for you if you share the standalone test).

link

kloch 1699 days ago

Here is a complete simplified Kahan summation test and indeed it works with -O3 but fails with -Ofast. There must have been something else going on in my real program at -O3. However the original point that 'volatile' can be a workaround for some optimization problems is still valid (you may want the rest of your program to benefit from -Ofast without breaking certain parts).

Changing the three kahan_* variables to volatile makes this work (slowly) with -Ofast.

  #include <stdio.h>

  int main(int argc, char **argv) {
    int i;
    double sample, sum;
    double kahan_y, kahan_t, kahan_c;

    // initial values
    sum=0.0;
    sample=1.0; // start with "large" value

    for (i=0; i <= 1000000000; i++) { // add 1 large value plus 1 billion small values
      // Kahan summation algorithm
      kahan_y=sample - kahan_c;
      kahan_t=sum + kahan_y;
      kahan_c=(kahan_t - sum) - kahan_y;
      sum=kahan_t;

      // pre-load next small value
      sample=1.0E-20;
    }
    printf("sum: %.15f\n", sum);
  }

link

vlovich123 1699 days ago

Correct. `-Ofast` claim to fame is it enables `-ffast-math` which is why it has huge warning signs around it in the documentation. `-ffast-math` turns on associativity which is problematic for Kahan summation. Rather than sprinkling in volatiles which pessimizes the compiler to no end, I would recommend annotating the problematic function to turn off associativity [1][2].

Something like:

    [[gnu::optimize("no-associative-math")]]
    double kahanSummation() {
      ...
    }

That way the compiler applies all the optimizations it can but only turns off associative math. This should work on Clang & GCC & be net faster in all cases.

This is what I mean by "If you're sprinkling volatile around, you probably aren't doing what you want" and are just cargo culting bad advice.

[1] https://stackoverflow.com/questions/26266820/in-clang-how-do... [2] https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/Function-Attrib...

link

hermitdev 1699 days ago

I hope this isn't the actual "real" code, because you've got undefined behavior before you even have to worry about the associativity optimizations. There's an uninitialized read of 'kahan_c' on the first loop iteration.

link