Hacker News new | ask | show | jobs
by MichaelZuo 2098 days ago
It’s interesting that they use the word to broken to describe incompatible machine code. Well if the code is recompiled for each new version then it’s different from the old machine code, that’s by definition. Does any major software vendor support older versions of the ABI or machine code?
5 comments

> Does any major software vendor support older versions of the ABI or machine code?

Yes, this is extraordinarily common. The ABI is an interface, a promise that new versions of the machine code for a library can both be used by binaries compiled against the old one. There's new machine code, but there's no "by definition" of whether they make this promise or not.

glibc (and the other common libraries) on basically all the GNU/Linux distros does this: that's why it's called "libc.so.6" after all these years. New functions can be introduced (and possibly new versions of functions, using symbol versioning), but old binaries compiled against a "libc.so.6" from 10 years ago will still run today. (This is how it's possible to distribute precompiled code for GNU/Linux, whether NumPy or Firefox or Steam, and have it run on more than a single version of a single distro.)

Apple does the same thing; code linked against an old libSystem will still run today. Android does the same thing; code written to an older SDK version will still run today, even though the runtime environment is different.

Oracle Java does the same thing: JARs built with an older version of the JDK can load in newer versions.

Microsoft does this at the OS level, but - notably - the Visual C++ runtime does not make this promise, and they follow a similar pattern to what Nvidia is suggesting. You need to include a copy of the "redistributable" runtime of whatever version (e.g. MSVCR71.DLL) along with your program; you can't necessarily use a newer version. However, old DLLs continue to work on new OSes, and they take great pains to ensure compatibility.

Excellent comment, I was wondering how glibc handled backwards compatibility.

Is symbol versioning an ELF object file thing, or is it more universal than that?

Almost all of the time, they do it via just adding new features and not breaking old ones.

But yeah, GNU/Linux and Solaris both have symbol versioning as part of ELF (I'm not sure if other executable formats have it; it doesn't actually require very much out of the format). The approach, roughly, is that each symbol in the file is named something like "memcpy@GLIBC_2.2.5", and if you see symbol versions in the library you're linking against, you include those references. The dynamic linker is also smart enough to resolve unqualified symbols against some default version the library specifies. This is important for backwards-compatibility, for the ability for distros to add symbol versions when upstream doesn't have them yet, and for things like dlsym("memcpy") keeping working. When they make a backwards-incompatible change (e.g., old memcpy supports overlapping ranges, new memcpy does not promise to do the right thing and you need to use memmove instead), they add a new version (e.g., "memcpy@GLIBC_2.14"). Anything compiled against the newer library will reference the new version, but an implementation of the old version still sticks around for older functions.

And yes, there were older versions before libc.so.6 - libc.so.5 was used, I think, in the early 2000s, but they've avoided changes since then. (The approach used there is that you can install both of them on a single system, but "libc.so" symlinks to one of them, and that name is used when you compile code. When you run gcc -lfoo, it looks libfoo.so, but if the library has a header saying its "real" name, called its "SONAME", is libfoo.so.1, the compiled program looks for libfoo.so.1 and not libfoo.so.) Now you only have to have a single glibc version and it works with many years of updates.

> Does any major software vendor support older versions of the ABI or machine code?

The C++ Standards Committee has been prioritizing ABI compatibility at the cost of performance for the last decade or so (mostly in the standard library, as opposed the language itself, as I understand it). Some people (especially people from Google) have been arguing that this is the wrong priority, and that C++ should be more willing to break ABI. See:

https://cppcast.com/titus-winters-abi/

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p186...

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p213...

Disclosure: I work at Google with several of the people advocating for ABI breaking changes.

You'll notice some people from NVIDIA are authors on those papers too! :)
Note here that your binaries will continue to run even on future driver versions - and future hardware - that's what PTX is for, as the standard libraries are statically linked in.

It's just your object files that aren't compatible, so that you can't mix and match libraries built with different CUDA versions into the same binary.

Yep, this is a good summary (good enough that perhaps I should put something similar in the docs).
Famously Microsoft does with Windows. That's how an exe file from 25 years ago can still run today.
Yes, but GPU architecture changes very frequently.

Shaders from 15 years ago still work, but they're compiled on-the-fly to a GPU-dependent format. I expect you don't want to have to recompile an entire c++ stdlib every time you recompile your own code.

> I expect you don't want to have to recompile an entire c++ stdlib every time you recompile your own code.

That's basically our current model, I discussed this on Twitter recently.

https://twitter.com/blelbach/status/1307396914057326592

Do they use some kind of ABI versioning?
Running 32 bit x86 code on a AMD64 machine is possible on most operating systems which supported both of these, and has probably more to do with AMD64 supporting that execution model.
Try that on Linux and you'll find most libraries no longer have the same entry points and that various data structures have changed leading to fun fun crashes...

The kernel itself has maintained (mostly) ABI compatibility though.

That's a "you're holding it wrong" problem, though. Projects like GTK or Qt never claimed they'd be backwards-compatible 26 years (Qt has specific backwards-compatibility API and ABI guarantees and are in my experience pretty diligent about it), so if you want a binary to work for a long time, you have to ship your own versions of these. Libraries like Xlib on the other hand are very stable and much more similar to the Win32 API in that respect. In theory Linux has versioning for libraries, in practice it is never used correctly and useless anyway, since distros generally only keep around one version of everything, so even if you'd link against a specific version (e.g. libfoobar.so.2.21 instead of libfoobar.so.2, which will break if you don't recompile and/or patch the source), it wouldn't exist _anyway_ after a few updates. And that's mostly because distros never promised you'd be able to run binaries built outside their packaging infrastructure anyway; it being common practice and sometimes working doesn't imply it's guaranteed to work.

Hence why C applications only linking these "basic" libraries (libc, Xlib, zlib, ...) are regarded as so stable and portable, because they're built and linked against system components which rarely change. (Keep in mind to build this kind of binary on ancient systems, otherwise glibc will make sure it won't work everywhere).

This is one of those things it feels like all the content addressable initiatives should be able to solve somehow. With near ubiquitous internet access, why can't a program ship with a list of standard library hashes it'll link against and my distro go fetch them from IPFS or whatever if they're not local.
Nix basically already does this, apart from the decentralised distributed cache (there is a centralised one and you can easily set up your own, too). All references, including to dynamically linked libraries are via unique, content addressable hash -- where "content" currently still happens to be content of the build recipe and all dependencies and sources, recursively, not the built artefact. There is work on referencing artefacts by the binary output hash though, because that obviously has better security properties when you want to have a non-centralized cache; the main problem is that a lot of software still has no reproducible build.
The solution for this is now Docker, Flatpack, Snap, …

Just ship the whole environment and only rely on the stable kernel API.

This is what AppFS does, as well as CernVM-FS, though AppFS has more features
We change the mangling of all the symbols by changing the inline namespace that they are in, regardless of whether or not functional ABI breaks occurred. That's why it says the ABI is broken on major releases. We do this to try and loudly break people who are trying to depend on ABI stability, instead of silently failing them.