| > Yes, but often a one off step that sets all your calls to call through a pointer, so each call site in a dynamic executable is slower due to an extra indirection. That is true, however in tight loops or in hot code paths it is unwise to instigate a jump anyway (even into a subroutine in the close locality). If the overhead of invoking a function in the performance sensitive or critical code is considered high, the code has to be rewritten to do away with it, and it is called microoptimisation. This will also be true in the case of the static linking. Dynamic libraries do not cater for microoptimisations (which are rare) anyway. They offer greater convenience with a trade-off over the maximum code peformance gains. > The cache is not unlimited nor laid out obviously in userspace […] I should have made myself clearer. I was referring to the pre-linked shared library cache, not the CPU cache. The pre-linked shared library cache reduces the process startup time and offer better user experience. The cache has nothing to do with performance. > So you suffer more page faults than you otherwise have to in order to load one function in a page and ignore the rest. I will experience significantly fewer page faults if my «strlen» code comes from a single address in a single memory page from 10k processes invoking it (the dynamic library case) as opposed to 10k copies of the same «strlen» sprawled across 10k distinct memory pages at 10k distinct memory addresses (the static linking case). |