Hacker News new | ask | show | jobs
by hudsonwillis 819 days ago
Interesting article. I investigated a few mentioned cases and here're my thoughts:

1. Pretty much all libgcc builtins (especially those manipulating integers/i128/bits) are known to be not super optimized and probably not well maintained to latest architecture. All places I worked use libdivide and other hand-rolled implementations rather than __addvti3

2. Many 32-bit inefficient compilations work correctly on amd64 platforms. Perhaps it's the lack of maintenance on 32-bit platforms?

3. When using two intrinsic in the same function, it's almost always slower than a hand-crafted snippet implementing the same semantic. Compilers aren't smart enough in this case.