Hacker News new | ask | show | jobs
by davidp 4502 days ago
I'm puzzled by the idea of a system being leaner/faster with n copies of a library in physical RAM rather than 1 copy mapped via VMM into whatever process wants it. IIRC this was the main point of shared libraries, not pluggability or changing code during runtime. Am I missing something?
7 comments

Static linking doesn't necessarily link the entire library, unless the entire thing compiles to a single .o file. Linkers are smart enough to only link in the object files needed by the program. So assuming you are only linking in well-designed libraries, I guess it's possible that statically-linked software will be smaller since it will leave out the stuff you aren't using.
I recently played around with this, taking a rather small project (around 15,000 lines of code), putting it all into a single file and compiling. It did produce a smaller executable, but the real gain was in making every function static (since it's all in a single file). Doing that, a total of 41 functions were eliminated (either inlined or not used at all).

Was it worth the effort? Eh. But it was instructive and I'd like to attempt (when I get some time) to try a larger project.

Some popular software like SQLite combine all their sources into one big source file called Amalgamation and then compile that. Their benchmarks show modest but not negligible performance gain.

There's a lot of work going on in link time optimization at the moment, both in LLVM and GCC. It's not quite ready for prime time, it still takes more than a small change in your Makefile to deploy it (e.g. dealing with linkers etc).

With LLVM toolchain you can compile C code (or other high level code) into LLVM IR, link the IR files together and run that through the optimizer.

You will notice that modern optimizers will want to inline everything if possible and a lot of functions will be missing from the resulting binary. Boundaries of object files are perhaps the biggest obstacle in optimization today.

You essentially did Whole Program Optimisation by hand.
We did that (as a developer option) with KDE as well, since KDE 2 or thereabouts?

With the automake-based build system you'd pass "--enable-final" and the buildsystem would cat all the source files together and compile the whole damn thing at once (and really stress-test the kernel and gcc).

With KDE 4 I believe it is -DKDE4_ENABLE_FINAL=TRUE passed to cmake.

It was never quite 100%... sometimes you'd run into things like different source files in a modules declaring the same class name, insufficiently-namespaced header include guards, etc. But it was definitely interesting.

That approach is common practice when developing for game consoles.
> So assuming you are only linking in well-designed libraries

This is a very big assumption.

Dynamic linkers are also clever enough to only mmap the required parts of dynamic library.

It's been awhile since I've mucked about with these sorts of things, but I'm glad that my thought about that is confirmed: if static linking only brings in used functions, why doesn't dynamic loading do the same (it does, apparently)? Much of this railing against dynamic loading wasting resources seems like complaining about the wrong things, either bad dynamic linkers, or bad libraries, neither of which will be fixed by static linking.

Don't get me wrong, there are places I think that static linking is ideal. I wish more distributors of binary only software would statically link, or at least include standalone required dynamic libraries, rather than rely on system dynamic libraries.

I wish them luck in their experiment and hope they can improve static linking, but I suspect they will learn more about why dynamic loading "wastes" so many resources the more they come in contact with real world libraries.

AFAIK static linking doesn't bring in "used functions".

It brings in used libraries, all at once. E.g. if you used sincos() from math.a, and math.a contained 47 other math functions, then you'd get all 48 math functions in your static binary just from using sincos().

Someone correct me if I'm wrong but I believe it's only with good whole program optimization at link time that it's possible to truly prove that a function is unneeded and exclude it (and then re-link if needed to re-resolve symbols to their new address in virtual memory).

You are wrong... sort of. It brings in used objects. A library can be made up of many objects; unused ones will be discarded.
Ah, good point, thanks for the correction.
I don't know that well how Linux dynamic loader works, my comment was based on what is possible in other operating systems in general, and what has been done in operating system research.
the whole library gets mmapped, there is no point doing otherwise, the mapping itself is cheap.

You are probably talking about demand paging (which happens on statically linked binaries.

You probably talking about Linux, there are other types of dynamic loaders out there.
Ah, interesting, do you have some links?
Combined with modern compilers and link-time optimizations they can even leave out code when it's just one .o file also. Dead code elimination ends up great with that there.
On top of what others have said, it takes some time to dynamically load a library into an address space. There are tables that may need to be walked and updated with correct pointers. For large libraries, this can be quite measurable. A statically linked executable will be memory-mapped and then brought in lazily as the program runs. And then if executed again, everything is mapped and loaded, so there is zero delay. Compare to dynamic loading, which will require table updates again (especially with ASLR).

This is why I like to re-make my shell and associated tools static binaries — shell scripts run 10-20% faster. (lots of small programs running repeatedly)

> it takes some time to dynamically load a library into an address space. There are tables that may need to be walked and updated with correct pointers

True, but I'd expect that to be dwarfed by the I/O time required to load even a single 4k page from disk, vs. keeping one copy of a big dynamic library like glibc loaded for the whole system, with fixups done per-process.

Good points about ASLR and static-linking frequently-exec'd-and-exited processes like the shell; and certainly for embedded and HPC it makes sense. I guess the moral, as always, is to measure.

A lot of those pages will already be in the file cache (assuming his use case of small utilities running frequently). Anyway, it should be easy to test: since glibc will always be in memory, any differences in timing between the static and dynamic version should be those alleged loading costs.
How many minutes are saved by 10-20% faster? A few seconds don't matter for a human.
We have some scripts at work that take 10+ hours to execute. Granted most of that time is in large child processes.
They get a speedup from eliminating the indirection used for calls across dylibs. From the sound of it they're also eliminating position-independent code, which itself can be a reasonable speedup, especially on 32-bit x86.
Maybe if we didn't have gigabytes of RAM these days. I quite like the idea of static linking. It makes software packaging and distribution very very easy and it has some security benefits. Go's build system is a good example of this. This project seems to be dead.
I have mixed feelings about the security gains.

On one hand, you eliminate one attack vector since you take ldd out of the equation. On the other hand, you depend on packagers who distribute their programs to rebuild and relink them every time a security issue creeps up a library they link with. I'm not sure I like that, and I don't have the free time I had in high school when compiling everything by hand seemed really fucking cool.

It's the distribution and packaging parts that I love about it. Also might help with cross-platform distribution as well; While dynamic linking does have advantages, I get really frustrated when an old application won't run on a new kernel due to requiring old libraries that simply can't be installed. Static linking fixes that, and it's why anything I try and write is statically linked for the most part!
Ever looked at Nix and NixOS?
i don't know about leaner, but static linking is faster. symbol resolution is a heck of a task to be run on every program load, especially for larger programs. i remember before linux optimized its dynamic linking, starting a gnome or kde program was a glacial process.

as for static linking being leaner, i have been told an entire shared library needs to be loaded if so much as 1 program needs a part of it, but i doubt it. i don't see why shared libraries can't be demand loaded just like executables are. then again, demand loading a shared library would be a more complex task, and i have reservations about complexity just like the suckless community does.

In generic distributions it would cause bloat, but in a single purpose embedded device, I found it was a significant size reduction.
If each copy uses less then 1/n of the library and the rest can be stripped away by the compiler, you need less memory.