Hacker News new | ask | show | jobs
by lgg 770 days ago
Windows and macOS both use a form of two level name-spacing, which does the same sort of direct binding to a target library for each symbol. Retrofitting that into a binary format is pretty simple, but retrofitting it into an ecosystem that depends on the existing flat namespace look up semantics is not. I think it is pretty clever that the author noticed the static nature of the nix store allows them to statically evaluate the symbol resolutions and get the launch time benefits of two level namespaces.

I do wonder if it might make more sense to rewrite the binaries to use Direct Binding[1]. That is an existing encoding of library targets for symbols in ELF that has been used by Solaris for a number of years.

1: https://en.wikipedia.org/wiki/Direct_binding

2 comments

I think you can get an effect similar to direct binding with symbol versioning.
You can get the same semantics as for direct binding using symbol versioning, but direct binding is faster.

Also, symbol versioning is only really better than direct binding if you end up having multiple versions of the same symbol provided by the same object, but that's relatively hard to use, so it's really only ever used for things like the C library itself. Mind you, that is a very valuable feature when you need it. In Solaris itself when we needed to deal with the various different behaviors of snprintf() there just wasn't a good way to do it, and only symbol versioning with support for multiple versions of a symbol would have helped.

Not really... symbol versioning is a form of namespacing, but it is somewhat orthogonal to this.

Symbol versioning allows you to have multiple symbols with the same name namespaced by version, but you still have no control over what library in the search path they will be found in. So it does not improve the speed of the runtime searching (since they could be in any library an the search path and you still need to search for them in order), and it does not provide the the same binary compatibility support and dylib hijacking protection (since again, any dylibs earlier in the search path could declare a symbol with he same name.

One could use symbol versioning to construct a system where you had the same binary protection guarantees, but it would involve every library declaring a unique version string, and guaranteeing there are no collisions. The obvious way to do that would be to use the file path as the symbol version, at which point you have reinvented mach-o install names, except:

1. You still do not get the runtime speed ups unless you change the dynamic linker behavior to use the version string as the search path, which would require ecosystem wide changes.

2. You can't actually use symbol versioning to do versioned symbols any more, since you overloaded the use of version strings (mach-o binaries end up accomplishing symbol versioning through header tricks with `asmname`, so it is not completely intractable to do even without explicit support).

> You still do not get the runtime speed ups unless you change the dynamic linker behavior to use the version string as the search path, which would require ecosystem wide changes.

Each ELF library declares the symbol versions it provides. The dynamic linker could track which library declares which versions, and cross reference that when it looks symbols up. I though it did, but from empirical testing, it doesn't. But if it did, it would get similar speed improvements, assuming all libraries provide at least one version each (and of course, assuming no overlaps).

That's what "direct binding" is. And as you can see, Linux doesn't support it.
> Symbol versioning allows you to have multiple symbols with the same name namespaced by version, but you still have no control over what library in the search path they will be found in.

Yes, but since the convention is to use the SONAME and SOVERSION in the symbol version therefore in practice the symbol version does -when adhering to this convention- help in binding symbols to objects.

Still, because this is an indirect scheme it does not help speed up relocation processing.

As you say, direct binding is better for safety and speed.

Absolutely, I just think "when adhering to this convention" is a high risk. Admittedly I mostly work on macOS so I don't have a nearly as deep of an experience with ELF, but in my experience even when a system looks to be well maintained that you often find surprising numbers of projects being "clever" and breaking conventions as soon as you try to do something that depends on everyone actually globally following the convention.
That is much better than the Linux model!

Not only is there less crawling around looking for symbols, you're no longer in trouble when two libraries export the same symbol.

Especially given libraries are found by name, and symbols by name, where "type information" or "is that actually the library I wanted" are afterthoughts.

> you're no longer in trouble when two libraries export the same symbol.

Whether you use direct binding or symbol versioning, either way you don't have a problem with multiple libraries exporting the same symbol.

By the way, this is the fundamental problem with static linking for C: it's still stuck with 1970s semantics and you can't get the same symbol conflict resolution semantics as with ELF because the static linker-editors do not record dependencies in static link archives.

The key insight is that when you link-edit your libraries and programs you should provide only the direct dependencies, and the linker-editor should then record in its output which one of those provided which symbol. Compare to static linking where only the final edit gets the dependency information and that dependency tree has to get flattened (because it has to fit on a command-line, which is linear in nature).

The advantage of the Linux model is that you can refactor which library actually contains a function, which is done quite often.
Filters and auxiliary filters also do that, but Linux doesn't support them well, which is really sad because they make for a very neat system.