|
|
|
|
|
by klodolph
3707 days ago
|
|
Cannot disagree more about #3. You almost never want asm volatile. The compiler is mostly doing data flow analysis, and I've seen so many programmers who don't understand that. So, if the compiler's data flow analysis doesn't put your asm block where you want, you just give up and put "volatile" on it. NO! Just let the compiler figure it out. You may be smarter about generating the assembly in this case, but the compiler is still very good at putting the assembly in the right place in your code. Usually, if I see "asm volatile" in someone's code, I step back and think "there's probably something wrong with the assembly" and I go back and read the manual on asm operand constraints, and then I find something wrong with the constraints. With the correct constraints / clobbers in place, my experience is that removing "volatile" only improves things. Of course this is not true for synchronization primitives and the like. |
|
My goal is to "lock in" an established level of performance once I've achieved it, so that compiler upgrades or changes don't result in performance drops. I often compare the output of multiple compilers with a matrix of optimization flags, choose the best blocks from each, and then hand-optimize from there while cross-referencing Agner's handbooks with Likwid's performance reports. If I've chosen to use inline assembly, the chances that the compiler will succeed in further optimizing my code is very low.
I realize it's not a popular view, but I think that using volatile with __asm is usually the correct approach. If you don't need "volatile", you probably should be using an intrinsic instead. I think the alternative (which may in fact be the better solution) is dropping to straight assembly for the entire function or distributing binary code.