|
|
|
|
|
by pixelesque
28 days ago
|
|
Yep, same here and agree. Compilers have definitely got better though: another issue in the past (maybe still is to a degree? although compilers have got a lot better at this in the past 15 years, but it used to be one of the things only Intel's ICC actually got right), that if you wrapped the base-level '__m128' or 'float32x4_t' in a struct/union in order to provide some abstraction, the compiler would often lose track of this when passing the struct/union through functions (either by value or const ref), and would often end up 'spilling' (not entirely the correct terminology in this context, but...) the variable from registers, and just producing asm which ended up uselessly loading the variable again from a stack address further up the call stack, when it didn't actually need to do that. So that was the situation even when using intrinsics within custom wrappers. From 2011 to around 2013 ICC seemed to be the only compiler on amd64 which wouldn't do this. If you passed the actual '__m128' down the function call chain instead, clang and gcc would then do the right thing. |
|
Using __attribute__ to tweak calling conventions can often really clean this up, but that's just as obscure and non-portable as the problem it fixes. So you either end up writing weird non-portable code one way or weird non-portable code another... Code working with these types doesn't get to benefit from zero-cost abstraction to the degree we're used to with normal scalar code.