Hacker News new | ask | show | jobs
by klodolph 3707 days ago
Cannot disagree more about #3. You almost never want asm volatile. The compiler is mostly doing data flow analysis, and I've seen so many programmers who don't understand that. So, if the compiler's data flow analysis doesn't put your asm block where you want, you just give up and put "volatile" on it. NO! Just let the compiler figure it out. You may be smarter about generating the assembly in this case, but the compiler is still very good at putting the assembly in the right place in your code. Usually, if I see "asm volatile" in someone's code, I step back and think "there's probably something wrong with the assembly" and I go back and read the manual on asm operand constraints, and then I find something wrong with the constraints. With the correct constraints / clobbers in place, my experience is that removing "volatile" only improves things.

Of course this is not true for synchronization primitives and the like.

1 comments

I understand that most others share your position, and mostly agree when it comes to volatile variables. I'd also agree with you if removing "volatile" caused the code to break. But I think that it can be necessary for performance, and don't think that there are true downsides. I believe if you are using assembly it is because you don't want the compiler to attempt any further optimizations. For the cases when I want to drop to assembly, it's because I've already decided the register allocation and instruction ordering I want, and will verify the assembly that is generated.

My goal is to "lock in" an established level of performance once I've achieved it, so that compiler upgrades or changes don't result in performance drops. I often compare the output of multiple compilers with a matrix of optimization flags, choose the best blocks from each, and then hand-optimize from there while cross-referencing Agner's handbooks with Likwid's performance reports. If I've chosen to use inline assembly, the chances that the compiler will succeed in further optimizing my code is very low.

I realize it's not a popular view, but I think that using volatile with __asm is usually the correct approach. If you don't need "volatile", you probably should be using an intrinsic instead. I think the alternative (which may in fact be the better solution) is dropping to straight assembly for the entire function or distributing binary code.

Yes, that is really a better solution: to write the whole function in assembly. "volatile" is just a poor substitute for that.
Other than the "code smell", what do you see as the main dangers of using "__asm volatile" rather than just "__asm"? Assuming that there are cases where I do get significantly better performance from specifying the exact ordering of instructions, what can I do to minimize these dangers while keeping the better performance?
The first danger is that "asm volatile" is basically a hack to get the output you want from the compiler. But the compiler is a rather complicated piece of software, and there is no guarantee that future versions of the compiler will still give you the desired output. Perhaps it works correctly now, but if you change your optimization settings are you sure that something unexpected won't happen? Remember that "asm volatile" can still be moved around. From the GCC manual[1]:

> Do not expect a sequence of asm statements to remain perfectly consecutive after compilation, even when you are using the volatile qualifier. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement.

The second danger is that "asm volatile" hides incorrect operand specification. If you examine the assembly, you might get the wrong assembly, and adding "volatile" might fix it. However, the incorrect operand specification might cause problems in other parts of the code. These are harder to diagnose. Stack Overflow is littered with questions by people who specify asm operands wrong, add "volatile" to fix the assembly, but other things are still broken. My general procedure is to work with asm blocks at -O2 or higher without using volatile, and make sure I'm getting the desired results that way (unless I'm writing some synchronization primitives).

Yet it is just so damn easy to write larger, multi-statement asm blocks. With larger blocks, the intent of the programmer is clear. It becomes obvious to both the reader and to the compiler that the assembly should be emitted as-is, rather than moved or reordered.

Finally, you can often get the results you want with the auto-vectorizer, restrict, and __builtin_assume_aligned. Whenever that is possible I'd prefer it.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html