Hacker News new | ask | show | jobs
by cb321 359 days ago
You may well already be aware, but just in case you aren't, your bin-true benchmark mostly measures dynamic loader overhead, not fork-exec (e.g., I got 5.2X faster using a musl-gcc statically linked true vs. glibc dynamic coreutils). { Kind of a distro/cultural thing what you want to measure (static linking is common on Alpine Linux, BSDs, less so on most Linux), but good to know about the effect. }
1 comments

Yup, I added an osh-static column there, because I know dynamic linking slows things down. (With the latest release, we have a documented build script to make osh-static, which I tested with GNU libc and musl: https://oils.pub/release/latest/doc/help-mirror.html)

Although I think the CALLING process (the shell) being dynamically linked affects the speed too, not just the CALLED process (/bin/true)

I'd like to read an analysis of why that is! And some deeper measurements

The calling process being dynamically linked might impact fork() a lot to copy the various page table setups and then a tiny bit more in exec*() to tear them down. Not sure something like a shell has vfork() available as an option, but I saw major speed-ups for Python launching using vfork vs. fork. Of course, a typical Python instance has many more .so's linked in than osh probably has.

One could probably set up a simple linear regression to get a good estimate of added cost-per-loaded .so on various OS-CPU combos, but I am unaware of a write up of such. It'd be a good assignment for an OS class, though.