|
|
|
|
|
by hudsonwillis
819 days ago
|
|
Interesting article. I investigated a few mentioned cases and here're my thoughts: 1. Pretty much all libgcc builtins (especially those manipulating integers/i128/bits) are known to be not super optimized and probably not well maintained to latest architecture. All places I worked use libdivide and other hand-rolled implementations rather than __addvti3 2. Many 32-bit inefficient compilations work correctly on amd64 platforms. Perhaps it's the lack of maintenance on 32-bit platforms? 3. When using two intrinsic in the same function, it's almost always slower than a hand-crafted snippet implementing the same semantic. Compilers aren't smart enough in this case. |
|