|
|
|
|
|
by jstimpfle
713 days ago
|
|
Thanks for clarifying, I understand much better now. And I think I can agree that under a strict interpretation of the rule that UB doesn't get reordered with observable behaviour the GCC output in the godbolt is wrong. Maybe it has something to do with the fact that it's volatiles? I've hardly used volatiles, but as far as I know their semantics have traditionally been somewhat wacky -- poorly understood by programmers and having inconsistent implementations in compilers. I think I've once read that a sequence of volatile accesses can't be reordered, but other memory accesses can very well be reordered around memory accesses. Something like that -- maybe the rules in the compiler are too complicated leading to an optimization like that, which seems erroneous. But look at this, where I've replaced the volatile access with a printf() call as you describe: https://godbolt.org/z/Ec8aYnc3d . It _does_ get optimized if the division comes before the printf. The compiler seems to be able to do the hoisting (or maybe that can be called "peeling" too?). But not if you swap the two lines such that the printf comes before the division. Maybe the compiler does in fact see that to keep ordering of observable effects, it would have to duplicate both lines, effectively duplicating the entire loop body for a single loop iteration. In any case, it's keeping both the printf() and the div in the loop body. |
|
If the first div happens before the first printf, then it can be CSE out of the loop as any trap would have happened before the printf anyway, so no reordering, and if it didn't trap the first time it wouldn't have trapped later either. In this case CSE is fine and there are no reordering.
If the div happens after printf, then reordering is prohibited not only to preserve side effects before UB (which we have seen GCC doesn't necessarily respects), but because for the most part printf is treated as an opaque function: it could legitimately exit, or longjump out of the function or never return, so on the abstract machine the UB might not happen at all. So it is not safe to hoist trapping instruction like div above opaque functions (but it is safe to sink them).
Still the modification I showed for volatile can be applied as well: peel the first iteration out of the loop so that the first printf can be done before computing the div to be CSEd out of the loop. But GCC doesn't do it although it seems desirable.