Hacker News new | ask | show | jobs
by veddan 3588 days ago
There are some methods to get around this. For example, there's an ELF extension called STT_GNU_IFUNC. It allows a symbol to be resolved at load time using a custom resolver function. This avoids the problem of figuring out which code-path to use on every invocation.

For example, you could have a function

    void hash(char *out, const char *in);
with two different possible implementations: a slow one using common instructions, and a fast one using exotic instructions. You can then can have a resolver like this:

    void fast_hash(char *out, const char *in);
    void slow_hash(char *out, const char *in);

    void (*resolve_hasher(void))(char *, const char *)
    {
        if (cpuSupportsFancyInstructions()) {
            return &fast_hash;
        } else {
            return &slow_hash;
        }
    }
1 comments

I'm a bit skeptical about the performance, especially with often-called functions.

Normally, asm would do

    call slow_hash
at every place where slow_hash is invoked, but now it has to check at every invocation a pointer with the address of the function.

Of course the loader could walk through all uses of the pointer to slow_hash and replace them by fast_hash on loading, but that won't work for selfmodifying (packed, or RE-protected) code.

GCC introduced __attribute__((ifunc(...))) precisely for this use case: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...
The resolution happens once only, just like other dynamic symbols - the result of the custom resolution call gets installed into the PLT, so subsequent calls will go directly to the right place.

That's the point of doing this in the linker - if you were going to look up the right function every call, you could do that entirely without special linker or compiler support.

In principle the compiler can inline the specialized hash function and specialize its caller instead (recursively). GCC is supposed to do that, but I hear that the optimiziation is still a bit unreliable.
That's why you macro and inline your code.

call's jxx's are expensive, hense CMOV ^_^