Hacker News new | ask | show | jobs
by gue5t 3662 days ago
C does have a standard ABI on each platform (where "platform" is slightly vague, ranging from "a place where people agree to use the SYSV ABI" to "Windows+MSVC"), so you can generally call into C libraries from the same "ecosystem" and not have to notice if they get recompiled between runs; library maintainers can put in some work and make promises about ABI stability.

The reason Rust doesn't have a defined ABI is basically that it wouldn't buy the same benefits it does in C. Specifying an ABI requires a lot of per-platform work (which the C community has already done), and, because of the importance of cross-crate inlining (all generic functions get inlined into call-sites by default), would not be sufficient to provide the benefit of in-place library updates. If you rewrite generic code in libfoo, and libbar depends on it, you can't get around recompiling libbar.

This is basically because Rust is a higher-level language where you use iterators, iterator adaptors, and higher-order functions in the course of writing libraries and applications. In C, you would manually inline things like iteration, writing for loops and populating intermediate data structures yourself. In Rust, this is something that can be factored out into libraries, but that means your code's meaning depends more deeply on the meaning of library code. To optimize away these abstractions and provide good performance, the compiler needs to inspect and make decisions based on library source code when compiling code that calls it. To permit efficiency, Rust basically has to be compiled from leaf dependencies upward.

In general, this is probably worth it, but it means we do need to rethink the C/UNIX style of packaging, which doesn't work very well when a libstd update implies every other package must also update. Some form of compiler middle/backend in the package manager (think Android's ART compiler), or a specialized form of binary or IR diffs (like Chrome uses) would probably go a long way. If we want to solve UNIX's problems, there will be a need for some cascading changes across the OS ecosystem.

2 comments

Or how about we avoid the whole thing and allow multiple versions of a lib to exist, and then prune the branches as they are no longer needed? Something akin to Nix/guix, Gobolinux, or even GNU Stow.
C and C++ libraries can use ELF symbol versioning.

Say you have a function like

  int foo_do(struct foo *, int);
but to fix a bug and/or tweak the API you change it to

  long foo_do(struct foo *, int, int);
then ELF symbol versioning allows you to do

  __asm__(".symver foo_do_v1_1,foo_do@@v1.1");
  long foo_do_v1_1(struct foo *F, int arg1, int arg2) {
    ...
  }

  __asm__(".symver foo_do_v1_0,foo_do@v1.0");
  int foo_do_v1_0(struct foo *F, int arg1) {
    long rv = foo_do_v1_1(F, arg1, 0);
    assert(rv >= INT_MIN && rv <= INT_MAX);
    return rv;
  }
where the runtime linker will link foo_do_v1_0 as foo_do for programs originally compiled against the v1.0 release, while programs built against v1.1 will be linked to foo_do_v1_1. You can do this as often as you want, though you can't generally go back further than when you first began using ELF symbol versioning to compile and release libfoo. You only need to add an ELF .symver alias for functions that have multiple aliases, but you do need to at least enable versioning (usually by specifying a version file with a catchall "*" entry which tag functions not explicitly aliased) at the point you begin maintaining a stable ABI.

glibc is pretty much the only major library that makes use of this capability, despite the fact that it's been around for well over a decade. Most developers simply don't have the foresight or interest in providing rigorous forward and backward ABI and API compatibility. Partly that's because in the open source world, recompiling packages is much easier than in the proprietary world. And especially in the Windows world (where the CRT was never forward or backward compatible) you often packaged dependencies with your software, even if dynamically linked. And so newer languages like Go and Rust are being built with the presumption that both recompiling and bundled dependencies are the norm--it's what people are doing anyhow, and it simplifies the compiler and its runtime. That it's sad that this is the norm is beside the matter.

Interesting, I had no idea the C runtime on Windows is not forward or backward compatible.

I'm curious about your thoughts on why the "recompiling and bundling dependencies" approach might not be the best way compared to ELF versioning facilities? Do you just feel like its a less elegant solution?

Thanks

Embedded software is a huge security problem on the internet precisely because it's difficult to update. Once the vendor loses interest in maintaining it, it'll never be updated. With shared library systems like RedHat and Debian, you can at least upgrade shared components for a substantial period as long as the developer cooperates reasonably well.

With the movement to statically compiled apps, we're just going to see more and more ancient code running in the wild.

It's the same thing with containers like Docker. Even assuming a container is using something like RedHat or Debian, the very reason it's a container is because it's customized somehow. However it's done, the result is that maintenance and ownership of the basic software stack becomes increasingly fractured, and it will be more difficult to benefit from the work of the thousands of distribution contributors.

Static compilation and container approaches have much to recommend them. When you cut a release it's arguably better that you control all the dependencies. But what happens when development slows down, you lose interest, or you move on, as do vendors of embedded software inevitably do? The Google's and Amazon's of this world have armies of developers to fill in the gap. Statically compiled Go apps have almost no downsides for Google given how the company is built around their server infrastructure technology and devops army. But for everybody else who is an end-user of software incapable of taking ownership (which can apply to software companies, too), we're just going to see the same problems that have plagued, for example, router software and blogging software, expand.

In the ideal world, developers would pay attention to ABI and API stability, particularly developers of core components. And they would make it easy to design systems so that these core components could be updated without having to rebuild or reinstall the dependent software.

But we don't live in that ideal world (witness OpenSSL, which has horrible API stability[1]), so _sadly_ the path of least resistance is static compilation and, more recently, containers. And so new and very actively maintained software will see quicker releases, but the long tail of less actively developed software will grow increasingly insecure. And all the while developers will shift the blame onto system administrators and everybody else so that they won't have to be burdened by careful and conservative interface design.

[1] While OpenSSL has been moving toward improving their API and ABI stability, interestingly Google's BoringSSL has completely eschewed such stability. Why? Because they don't need that stability, as I explained above. But the vast majority of direct and indirect users of OpenSSL would benefit tremendously from improved stability, because it makes it easier to upgrade dependent software.

Eventually we will want to do updates, unless you prefer bugs to not. When we do, we'll need strategies to tame the quadratic space blowup caused by lack of sharing across the dependency tree relative to the current model.
Yes, but it means they can be done in the background without disrupting the workflow as much.

Install/build the new version while keeping the old in place, then flip over to the new when ready, and then starting taking out the old one.

Thanks for the detailed response and insight.