Titus Winters leading the google abseil library eventually came to the conclusion that the only sane way to manage a large scale C++ system is to "live at head" [1] -- that is, libraries should live at the head production version of their dependencies.
This is patchworked around in more easygoing languages with dependency management systems, docker containers, etc. etc. but if you can enforce living at head from the start it makes everyone's life easier.
I can no longer count the number of times we had an issue with a "supposedly" minor release that ended up breaking major things in our stack. Most of them were things that could have been detected using unit tests or some kind of basic regression testing.
If you have a 1000 dependency packages, and at any point in time 0.1% of them are broken, then odds are you will always have something broken.
We should be clear... in these large organizations... HEAD is always broken. But it has the advantage of being broken for everyone, tested by everyone, fixed by everyone, and thus fixed for everyone. And this usually makes it far better than the alternatives.
Having 1000 dependencies with versions pinned means you are living alone and will run into fewer issues, but when they do come, they will be absolute nightmares that no one else is dealing with and no one can help with. And one day you'll have to do the game of begging someone else to upgrade their version of a downstream thing to fix the issue, and they won't, so you'll try to get the other group to backport the fix in their thing to the version you can't upgrade off. And they won't. etc. etc.
Full versioning is the worst of all approaches, IMO, for large complex interconnected codebases (especially ones that are many-to-many from libraries to output binaries) but it absolutely is sometimes the only viable one (for example, the entire open-source-ecosystem is a giant(er) version of this problem, and in that space, versioning is the only thing that I can imagine working).
Supply chain attacks are worsened if everyone lives at the head. Staying far enough behind that some brave (and hopefully small) project discovers the compromise of a repo for some dependency five layers deep before you re-pin to a new version is probably the best mitigation short of some permissions based model like Austral is working on.
C++ doesn't have the same kind of head as you are thinking. The standard gets pretty well-tested before being standardized as the most recent ("head") version. C++'s head gets more testing than most libraries ever get for any version.
Anyone with proper experience on C++ ecosystem knows this isn't the case.
Not only is ISO full of DR and things that probably shouldn't have been standardized in first place (thankfully some of them were latter removed), there is plethora of compilers to chose from.
Yeah everyone seemed to have atarted talking about living at head in general. My concerns about living at head for c++ is primarily compiler and static analysis support. I like my pipeline to build for clang and gcc and run unit tests in both clang tidy modernize, gcov, cppcheck asan etc.
By the time all of that is supported for a language release we are on the next one, and I value that pipelene more than the changes currently being made.
The flip side is that exposure to software vulnerabilities is lengthened if people stay on older versions. So, you’ll be less vulnerable to intentional bugs in the software, but more vulnerable to unintentional bugs - and the latter are far more likely in practice.
Granted, the former can be quite a bit more severe - but that’s why we should do things like build on dedicated servers with restricted access to the internet etc.
There are older versions (no new features) and then there are older versions (no security updates). Most security updates don't break compatibility and can be installed without modifying anything that takes that version as a dependency.
This works as long as compatiblity-breaking changes are kept rare so that you can feasibly have someone doing security updates for each of the incompatible versions.
> We should be clear... in these large organizations... HEAD is always broken.
I don't know about all of them, but a lot of them have CI set up such that HEAD is never broken for some definition of broken.
Practically speaking it's basically impossible to fuzz test everything, but typically builds and reasonably fast and reliable tests are run before HEAD includes new changes.
Being efficient about supporting this workflow is more or less what monorepo build systems like bazel, buck, and pants were designed for.
> it has the advantage of being broken for everyone, tested by everyone, fixed by everyone
The correct word here is not advantage but risk.
While making a product you rely on something, some other product/package. Thousand of them if you are unlucky. If you have 1000 packages to collaborate in the development/testing/QA of them you will do nothing else! Also try building a house on a concrete that is still maturing, with tools not ready yet, I dare you! Careful developers rely on reliable things. It is a shame in this industry this is not available. Either released(!) things are not ready yet, or are in demise already, the sweet spot is tiny.
It depends on the context: in a React Native project you are almost guaranteed to realize the hard way that you depend on abandonware that won't ever support the latest React Native version or the latest smartphone operating systems, while in a Python project you are almost guaranteed to attempt to use something that is hard to install or compile, or just doesn't work on your slightly divergent platform (e.g. Windows).
Google concludes the way they’ve always done things is the only way that scales, news at 11. Staying at HEAD can be quite nice in some respects, but the conclusion there only applies to Google, and only then the Google that was actively created to be amenable to the results you’re seeing.
Yes I realize. I didn't claim otherwise. The original post was about dropping support for older C++ standards. I was pointing out that Abseil's proposition is quite a bit more heavy handed than Boost's.
This somehow reminds me of the argument around git-flow. It's a decent reasoning, however, the whole idea is based on having a single, bleeding edge version. Basically a SaaS.
A lot of companies/products do not work that way. Some have physical products out there that have to be updated, some have on-premises deployments, some sell user software of which there are multiple versions under support. Each of these live versions have to have their own source branches and dependency trees. A single `:latest` can render future bugfixes unbuildable.
I do it pretty much on my own for https://ossia.io ; it takes me roughly a day every other month to update my mac, linux and windows SDKs to the new LLVM / Qt / FFMPEG / {... other large dependency I use ...}.
Definitely not the end of the world. Said SDKs & build scripts are available here: https://github.com/ossia/sdk for anyone interested (I'll be honest though: the scripts are a mess!)
1 day for... ~500 kLOC, it seems. Which is pretty good I think, but larger codebases would have a lot more to deal with -- not just more dependencies, but also more breakages per dependency.
That works for google web where they can upgrade everything at will. When you work in embedded upgrades can cost millions of dollars since someone needs to go to remote locations to run the upgrade.
There are various ways to implement this, but to simplify the explanation, assume that versioning always works via lockfiles [1]. Lockfiles record all of the versions of all of the projects that contributed to some experiment or release. They're common in various language-specific ecosystems like gem, pip, cargo, and so on.
Assuming you have these lockfiles, you have the typical option, which would be to make a lockfile for each released entity, record it for later reproduction, and update it at least every release.
The "live at head" approach would be to instead have a main shared lockfile for every project in the company in a recorded sequence. All projects pick a version of that lockfile to release from. Practically speaking, all projects probably just take the latest version (the head) of that lockfile and everyone works hard to make sure that lockfile always works for everyone.
The main advantage here is pretty straightforward combinatorial math. Maintaining and validating unique combinations of dependencies for every release in a codebase is NP-silly, whereas sharing one set of dependencies across as many applications as possible isn't easy but it has a much nicer cost curve. In theory at least, but a lot of large organizations claim practice backs the theory up as well.
[1] Versioning doesn't have to work this way. Putting all code into one big source repository (vendoring) has the same effect.
It does. Google mirrors every commit to LLVM in their monorepo, builds and tests the whole monorepo with a fresh Clang nightly, and (ideally) one of those Clang nightlies is released as the new stable compiler for all users of the monorepo every week. This helps keep Google at HEAD and helps keeps LLVM upstream stable.
As far as I know, there was only ever one version of LLVM at a time in the monorepo. It's possible things changed after I left (either the compiler team or Google).
Each upstream commit didn't land directly into the monorepo, instead there was a long lived branch, and on the compiler team there was a buildcop rotation responsible for doing an integrate from that branch into //third_party/llvm. This included running the tests (and fixing any problems) for any other software that depends on LLVM as well as building an unstable crosstool and doing some basic smoke tests on that. Taking that crosstool through testing and to stable crosstool was the responsibility of a different buildcop rotation, using a special compiler team tool for testing the testing crosstool nightly, then to release to stable we used the ordinary presubmits, but for all projects at once, making its testing as similar as possible to any normal code change.
How does Google deal with other projects' embedded code copies? ie some other project embeds a random old snapshot of LLVM, when you import that project into the monorepo do you end up with two copies of LLVM, or do you strip the old copy and port the codebase to the current LLVM in the monorepo?
"Head" at Google means "head of the google3 monorepo". I gather they stay close to upstream for GCC and LLVM. Alsonote that they aren't necessarily matching upstream offerings with respect to C++ standards flags. They aren't necessarily matching upstream for various third-party libraries either. "Head" for third-party libraries means "the latest vendored in from upstream", which doesn't have to be particularly new.
In contrast, those third-party projects like boost would probably consider the latest commit on their HEADs of their git repos to be "head". So "live at head" is a statement about how each organization should version its dependencies. It doesn't really make sense in the context of boost maintainers deciding their support surfaces since they have all the "heads" to worry about, inherently -- all the organizations using boost libraries.
I'm surprised to see "GCC as shipped by RHEL 7" used as an argument downthread. If you need to use an older GCC then you can equally well use an older Boost.
People point out RedHat provide newer compilers (and newer patched C++ standard libraries depending on the default one!) and he says "It's not us, it's our customers. And no, they won't use any compiler
which isn't the system default".
Either you only use the system defaults, and that includes the old Boost included with RHEL, or you don't. I guess he includes a private Boost copy as part of his project. Well, include a private GCC too then. A dependency is a dependency
Those customers have silly requirements for no reason. The whole world shouldn't make efforts to accommodate them.
I am going to bet those customers are going to argue about "stability". All this comes from people reading "stable" as "rock solid, never crashes, always works as expected" when in the case of RHEL it actually means "it doesn't change, no new bugs introduced, and you can relly on the old bugs staying there". Those customers just need educating.
Library A uses Boost and requires an older version of GCC. Library B uses Boost and requires the newer version of Boost. You want to use libraries A and B in the same project, what now?
> Library B uses Boost and requires the newer version of Boost.
but the same question applies to Library B: if your regulations state that you can't update your compiler version past the default distro one, why can you bring in some random recent libraries that are definitely not part of the distro since they depend on a Boost version that is more recent than the one your distro provides?
Of course this all very stupid when you can install GCC 11 and compile in C++20 mode on RH 7 with the official devtoolsets...
But the core problem is as always to tie compilers to linux distros, like a C++ compiler version is relevant in any way to your operating system's stability...
It's not a question of regulations. You either need to use the old library with the new compiler or the new library with the old compiler.
But the old one is crufty and barely maintained and nobody wants to touch it, and the new one is only using one feature of the new version of Boost, so it's a easier to blacklist the newer version of Boost than to overhaul all the old code. But is that what we wanted to cause?
Moreover, with widely used core libraries like this, that sort of thing happens repeatedly, and now the downstream users have to do work they wouldn't have had to do if compatibility was maintained. At scale probably a lot more work than it would be for the widely used thing to maintain compatibility. That seems bad.
If you've chosen to use an ancient OS, with an obsolete compiler, you should not expect to be able to use anything new with it. Just keep the mothballed code as-is. Maybe find an equally old version of the library B if you must.
Or pay RedHat to backport the library B to the old Boost for you :)
Also you can almost certainly use a newer compiler to target old OSes.
The real problem is when you have a big pile of ancient, unmaintained code that uses long-deprecated stuff. At that point, no, you really can't expect to seamlessly interoperate with new code.
Like Nodejs, just have both, dedupe when possible, static compilation, still fight over getting both libraries to co-operate with different Boosts, rage a bit, curse thee, thy name is dependency hell!
That can be made to work. It is not pleasant or easy but it is technically possible to compile different parts of the code with different compilers and library versions and link them together.
The vast majority of the time it is just not worth it.
EDIT: I'm not sure the below answers the question as asked, but it does clarify why A and B probably want to use the same version of boost.
In practice A and B would each have their own namespaces in C++ codebases, but that wouldn't resolve the tension if each wanted a different version of boost. One approach to resolve that tension is to figure out how to have two versions of boost in the same dependency tree. The below is addressing that proposal.
---
Practically, no. You could certainly create a new namespace C++ names: functions, classes, global variables, and so on.
But there are other "names" in C++ that don't respect C++ namespaces: C symbols, included headers, preprocessor names, and library names (as in `-lfoobar` on a link line). You'd need to make up new names for all of these and possibly a few more things to make a full duplicate of a library.
Now, if you managed to do all that, there are still problems to watch out for. For instance, it's common for projects to assume that global or static names in a C++ namespace can be treated as unique process-wide values in a running program. As in, `mynamespace::important_mutex` might guard access to specific hardware, so having a `mynamespace2::important_mutex` also thinking it has the same job would be bad.
And if that wasn't a problem, you still have to think about types that show up in APIs. How will downstream code be written when you have a `boost::string_ref` and a `boost2::string_ref` in the same codebase? Which `string_ref` do you use where? Can you efficiently convert from one to the other as needed? Will that require changing a lot of downstream code?
The only sane solution is, for libraries that need wide backward and forward compat, is to only expose abi/api stable types in your interface, but it doesn't. You can use still use boost internally, but make the symbols private and/or in a private namespace.
At the limit a stale interface is a C interface, but it doesn't have to be. GCC std types are fairly stable, and Qt manages a rich interface while maintaining robust ABI compatibility. It is hard work, and not always worth it of course.
Narrowing the interface helps, but the other "interface" is how the linker resolves names to specific addresses to code or data. The example I mention involving mutexes does not require those mutexes show up in public interfaces or necessarily "break" ABI guarantees. The mutexes don't even have to be used by the same source files! I guess you could consider it a library design flaw, but it's basically never mentioned as a design antipattern if it is one.
Note that it's not just mutexes. The same can happen with other kinds of "one per process" resources: memory pools, thread pools, connection pools, caches, registry objects, etc.
Yeah i've had to do this with other dependencies (which we didn't have source for) including an old or broken version of the same library we needed. It's a bit of a pain to get everything in a namespace, and of course the bloat for the executable.
Even more fun when two dependencies both use different versions of the same lib.
I much prefer bringing everything into our source tree up front and doing the build ourselves rather than just linking a prebuilt lib but sometimes you don't have that option.
Do you mean have library B built with its own separate copy of the new version?
e.g. You have Library A using LibDependency-1.0.0 and Library B using a separately compiled LibDependency-2.0.0? Then have MyAwesomeApp linking LibA and LibB and just accept the binary+memory overhead of two copies (albeit different versions) of LibDependency?
Probably, unless you need to share LibDependency data structures between 1.0.0 and 2.0.0, in which case, it depends on the implementation of LibDependency.
Rather than trying to do this at the language level with namespaces or whatever, it's probably easier to compile and link each version of the problem dependency into a separate library (static or dynamic), then to make sure each of your own libraries and executables only directly links to one version of the problem library.
This way, you don't have to rename or namespace anything, because conflicting names will only be exposed externally if you're linking to the problem library dynamically, in which case you should be able to arrange for the names to be qualified by the identity of the correct version of the problem library at dynamic load time (how to ensure this is platform-specific).
I've never tried it but I read that a use for namespaces was talking a library and wrapping all the #include statements in namespace library {} or whatever to avoid one stepping on another. Depending on how the library is written (and if it's all in source form rather than .lib files) I guess it should work?
The problem is the because LibraryA and LibraryB have distinct copies of LibDependency, the source/api compatible type you're using may have an incompatible internal structure.
As a library author there are things you can do for ABI compatibility, but they all basically boil down to providing a bunch of non-opaque API types that have some kind of versioning (it's either an explicit version number, or it's a slightly implicit size field which IIRC is the MS standard). You also have opaque types where the exposure of details is more more restricted, generally either just an opaque pointer, or maybe a pointer to a type that has a vtable (either an automatic one or a manually constructed one). In general use of the non-opaque portions of the API are fairly restricted because they have ABI implications, so a user of a library will communicate by providing those non opaque data to the library, but the library will provide largely opaque results with an ABI stable API that can be used to ask questions of an otherwise opaque type.
This works in general, and it means you don't have to rebuild everything from scratch any time you update anything. It breaks down however when you have different versions of the same library in the same process. The problem is that while you see a single opaque type, it's not opaque to the library itself so while an opaque type from two different versions of a library may look the same to you, the implementation may differ between the two versions. Take a hypothetical:
which is a kind of generic vaguely ABI stable looking thing (I'm in a comment box, assume real code would have more thought/have fewer errors), but lets imagine the a "plausible" v1.0
There's been no source change, no feature change, and from a library/OS implementors PoV no ABI change, but if I had an ArrayRef from the 1.0 implementation and passed it somewhere that would be using the 1.1 implementation, or vice versa, the result would be sadness.
As a library implementor there's a lot you have to do and/or think about to ensure ABI stability, and it is manageable, but more or less all of the techniques break down when the scenario is "multiple versions of the same library inside a single process".
We have used two boost versions built in different namespaces. Most of it works but there can be Fun if two now-independent versions of the same funtion use the same resource.
One big use case for Boost is to provide alternative implementation for standard containers and sometimes even language features that whatever toolchain you need to use doesn't have yet. Being bound to an ancient Boost version as well kind of limits the use you can get out of that.
You can build a new compiler on RHEL7 and ship the runtime libstdc++ with your program. We did that in the days of RHEL5, it was a bit of a hassle but it worked.
Why not go for C++17 already? C++14 seems pretty arbitrary, AFAIK C++17 is already supported by GCC, clang and MSVC fully [0]. It's also now the default dialect in a few compilers including GCC. Meanwhile C++20 isn't really well supported at all in the moment [1] so it makes sense waiting more for it, even though features such as Concepts and Modules should simplify library code a lot and make some tasks much more trivial.
The argument I can see is that 14 is the minimum feature set that every engineer knows how to use, and before 14 you are missing some of the "core" features of the language that are widely used today. By contrast, some things in 17 are still foreign to some engineers, although there are no real "breaking" changes from 14 to 17 (in terms of the model of how to think about the language). 20 and 23 are a bit of a mess in terms of support.
I’m living in a world that has to support a product which has many parts in C++ on Solaris, HP-UX, and OS/400 (in addition to Linux, AIX, and Windows). We can’t drop support yet for the AIX release that only support up to C++ 11 although they have a newer clang based compiler. HP-UX and Solaris native compilers only go to C++14. Those platforms have at least 5 years of enterprise support yet, but no new chipsets mean no extended compiler development. And up-to-date gcc doesn’t build there.
And then IBM’s OS/400 ILEC++ support isn’t even up to C++11.
It is hard to live in Enterprise software land where you support everything for a lot of years, but end up stuck on older third party libraries when they move to new standards to keep parity across your product line’s feature set.
This reminds me of my time at
Bloomberg. The C++ support from IBM and Oracle/Sun wasn’t nearly up to date, and without a large customer willing to foot the bill to improve the toolchains IBM and Oracle/Sun weren’t going to do the work.
There was a Linux migration project to move the company to Linux. I haven’t heard anything from people inside, but I’m willing to bet that there are still some stragglers that haven’t moved yet.
I really like what Herb Sutter is doing with cpp2/cppfront: it's a new language that translates to C++, by defaulting to C++ good practices and "avoiding 95% of C++ pitfalls". Please watch its presentation on it. It's designed to interact with C++.
Backward compatibility is good to have, but C++ needs for alternatives that allow it to drop support for old things, because the language needs to evolve and backward compatibility is preventing it, and it turns into very very long compile time.
And even if backward compatibility is out in C++, it would break in the language, not in the ABI, so old C++ and new C++ could still easily cohabit together, quite like C has always lived with C++ for a long time now.
I like C++, but nobody denies that C++ carries a lot of weight for being disliked.
So I mainly do Kernel or super low level systems code. And i've never had an issue targeting the latest C++ standard. Given I don't use the C++ STL and only use the language features. The backwards compatability is amazing and i've never had a single issue in my career with swapping forward. So good on them. I also do Windows work though, so maybe it's different on Linux.
This is patchworked around in more easygoing languages with dependency management systems, docker containers, etc. etc. but if you can enforce living at head from the start it makes everyone's life easier.
https://abseil.io/about/philosophy#we-recommend-that-you-cho...