Hacker News new | ask | show | jobs
by ot 338 days ago
That's a nice trick, but contrary to function statics, it is susceptible to SIOF. This kind of optimization is useful only on extraordinarily hot paths, so I wouldn't generally recommend it.

> On ARM, such atomic load incurs a memory barrier---a fairly expensive operation.

Not quite, it is just a load-acquire, which is almost as cheap as a normal load. And on x86 there's no difference.

One thing where both GCC and Clang seem to be quite bad at is code layout: even in the example in the article, the slow path is largely inlined. It would be much better to have just a load, a compare, and a jump to the slow path in a cold section. In my experience, in some rare cases reimplementing the lazy initialization explicitly (especially when it's possible to use a sentinel value, thus doing a single load for both value and guard) did produce a noticeable win.

2 comments

> That's a nice trick, but contrary to function statics, it is susceptible to SIOF.

For those (like me) who don’t recognize that abbreviation, “The static initialization order fiasco (ISO C++ FAQ) refers to the ambiguity in the order that objects with static storage duration in different translation units are initialized in” (https://en.cppreference.com/w/cpp/language/siof.html)

Yes, thanks for the clarification, what I probably should have said is that the trick is basically syntactic sugar to declare a scoped global static variable, and as such it inherits all the problems of global static variables.
FDO/PGO seem to really improve optimizations for hot/cold functions. I wonder if it does the kind of thing you're suggesting.
Not with any of the Clang versions I tried, but last time I checked it was a couple of years ago.