|
|
|
|
|
by Const-me
1639 days ago
|
|
> there only needs to be one way of doing it that the compilers knows about Too much magic to my taste. If compiler will be doing that anyway, why not expose an intrinsic we can use? The SSE instruction in question is rather efficient to emulate on NEON, only takes two instructions, vmovn_u64 and vmull_u32. It’s the same about scalar code. When I need to rotate an integer, I normally use intrinsics instead of relying on the compiler to optimize the code. Recently, C++ language even added these things in their standard library, <bit> header in C++/20. IMO, relying on such compiler optimization is fragile in the long run, for 2 reasons. 1. These are undocumented implementation details. Compiler developers don’t make any guarantees they will continue to support these things in exactly the same way. 2. Most real-life software is developed by multiple people. It’s too easy for developers to neglect comments, and slightly change the code in a way which no longer has a shortcut in the compiler. |
|