Hacker News new | ask | show | jobs
by nkurz 3707 days ago
re 3: If it were for correctness, I'd agree. But I don't need volatile to make it work, I need it to produce the assembly I want. If one instruction can execute only on Port 1 (popcnt) and the other can execute on Ports 0, 1, 5, or 6, there's sometimes a 50% performance difference based on the order two seemingly independent instructions are executed. Volatile also prevents the compiler from hoisting loads ahead of my inline assembly, which sometimes makes a difference. Clobbering "mem" might force other reloads that I don't want to happen.

re 5: Barring compiler bugs, I think you'd be right if correctness was the only issue. But I'm pretty sure I've sometimes solved problems by adding it, although this may have been when working around the POPCNT bug that added a false dependency on the output. It also might have been when reading and writing a variable multiple times?

re 6: In theory, yes. But usually in these cases you should be writing intrinsics or straight C instead of inline assembly. The place where this comes up most for me is when I have two variables that use the same index, and I want to ensure "DEC/JNZ" fusion at the end of the loop. If I let the compiler choose, it will find a way to defeat me by incrementing both array addresses. The other case is when you explicitly want a store to use Port 7 for address generation, which only happens without an index register.

re 7: Yes, I just personally find it more confusing because "x" fits so well with "XMM", and thus it feels odd to use it when you want only a "YMM". Also, see here for problems with a Clang and %q[VEC]: http://stackoverflow.com/questions/34459803/in-gnu-c-inline-...

re 4: Oops, I forgot to renumber. I had another comment suggesting that one always use the "V" VEX prefix on vector commands and the explicit output register, but deleted it because it seemed off topic.