Hacker News new | ask | show | jobs
by VHRanger 1205 days ago
Titus Winters leading the google abseil library eventually came to the conclusion that the only sane way to manage a large scale C++ system is to "live at head" [1] -- that is, libraries should live at the head production version of their dependencies.

This is patchworked around in more easygoing languages with dependency management systems, docker containers, etc. etc. but if you can enforce living at head from the start it makes everyone's life easier.

https://abseil.io/about/philosophy#we-recommend-that-you-cho...

10 comments

That's fine as long as HEAD doesn't break things.

I can no longer count the number of times we had an issue with a "supposedly" minor release that ended up breaking major things in our stack. Most of them were things that could have been detected using unit tests or some kind of basic regression testing.

If you have a 1000 dependency packages, and at any point in time 0.1% of them are broken, then odds are you will always have something broken.

We should be clear... in these large organizations... HEAD is always broken. But it has the advantage of being broken for everyone, tested by everyone, fixed by everyone, and thus fixed for everyone. And this usually makes it far better than the alternatives.

Having 1000 dependencies with versions pinned means you are living alone and will run into fewer issues, but when they do come, they will be absolute nightmares that no one else is dealing with and no one can help with. And one day you'll have to do the game of begging someone else to upgrade their version of a downstream thing to fix the issue, and they won't, so you'll try to get the other group to backport the fix in their thing to the version you can't upgrade off. And they won't. etc. etc.

Full versioning is the worst of all approaches, IMO, for large complex interconnected codebases (especially ones that are many-to-many from libraries to output binaries) but it absolutely is sometimes the only viable one (for example, the entire open-source-ecosystem is a giant(er) version of this problem, and in that space, versioning is the only thing that I can imagine working).

Supply chain attacks are worsened if everyone lives at the head. Staying far enough behind that some brave (and hopefully small) project discovers the compromise of a repo for some dependency five layers deep before you re-pin to a new version is probably the best mitigation short of some permissions based model like Austral is working on.
C++ doesn't have the same kind of head as you are thinking. The standard gets pretty well-tested before being standardized as the most recent ("head") version. C++'s head gets more testing than most libraries ever get for any version.
Anyone with proper experience on C++ ecosystem knows this isn't the case.

Not only is ISO full of DR and things that probably shouldn't have been standardized in first place (thankfully some of them were latter removed), there is plethora of compilers to chose from.

Most people looking for modern C++ are choosing one of three compilers. Most code mostly works.
Yeah everyone seemed to have atarted talking about living at head in general. My concerns about living at head for c++ is primarily compiler and static analysis support. I like my pipeline to build for clang and gcc and run unit tests in both clang tidy modernize, gcov, cppcheck asan etc.

By the time all of that is supported for a language release we are on the next one, and I value that pipelene more than the changes currently being made.

The flip side is that exposure to software vulnerabilities is lengthened if people stay on older versions. So, you’ll be less vulnerable to intentional bugs in the software, but more vulnerable to unintentional bugs - and the latter are far more likely in practice.

Granted, the former can be quite a bit more severe - but that’s why we should do things like build on dedicated servers with restricted access to the internet etc.

There are older versions (no new features) and then there are older versions (no security updates). Most security updates don't break compatibility and can be installed without modifying anything that takes that version as a dependency.

This works as long as compatiblity-breaking changes are kept rare so that you can feasibly have someone doing security updates for each of the incompatible versions.

There are solutions to this - things like Dependabot go a long way and can integrate pretty seamlessly into your existing Git workflow.
Supply chain attacks only matter for libraries that can make their own network call, or libraries that directly touch unsanitized web input however?
Supply chain attacks matter anytime you need to trust the code being run. Which is usually always.

Most libraries have network access, but even if they didn't, supply chain attacks could be relavent (but probably less generic)

How does one restrict network access for a library?
That was my thinking. In Austral that is a thing, but in c++ not so much. And if you are building the dependency from source, which is pretty common, the common build systems are all Turing complete themselves so they can take over your pipeline and do bad things.
Run your CI in an allow listed only network and only allow access to either your private, security scanned, mirror or else keep well trusted things. Even if a bitcoin miner gets into the stack it can’t send the results to the source so it is less dangerous.
> We should be clear... in these large organizations... HEAD is always broken.

I don't know about all of them, but a lot of them have CI set up such that HEAD is never broken for some definition of broken.

Practically speaking it's basically impossible to fuzz test everything, but typically builds and reasonably fast and reliable tests are run before HEAD includes new changes.

Being efficient about supporting this workflow is more or less what monorepo build systems like bazel, buck, and pants were designed for.

> it has the advantage of being broken for everyone, tested by everyone, fixed by everyone

The correct word here is not advantage but risk.

While making a product you rely on something, some other product/package. Thousand of them if you are unlucky. If you have 1000 packages to collaborate in the development/testing/QA of them you will do nothing else! Also try building a house on a concrete that is still maturing, with tools not ready yet, I dare you! Careful developers rely on reliable things. It is a shame in this industry this is not available. Either released(!) things are not ready yet, or are in demise already, the sweet spot is tiny.

It depends on the context: in a React Native project you are almost guaranteed to realize the hard way that you depend on abandonware that won't ever support the latest React Native version or the latest smartphone operating systems, while in a Python project you are almost guaranteed to attempt to use something that is hard to install or compile, or just doesn't work on your slightly divergent platform (e.g. Windows).
You don’t just say “okay C++20 is released, swap the compiler flags and break everyone.”

But you do say “okay, C++20 is released, let’s fix all the build errors and deploy it company-wide.”

This. Years of experience has taught me to avoid "living at HEAD". HEAD has undiscovered problems. Better be be just a bit behind that.
Google concludes the way they’ve always done things is the only way that scales, news at 11. Staying at HEAD can be quite nice in some respects, but the conclusion there only applies to Google, and only then the Google that was actively created to be amenable to the results you’re seeing.
Note that "large scale" is pulling a heavy weight here.

Also, living at the HEAD of your language standard is quite a bit different from living at the HEAD of other dependencies.

It's not just the language standard, they build and deploy the latest clang HEAD every week or so
Yes I realize. I didn't claim otherwise. The original post was about dropping support for older C++ standards. I was pointing out that Abseil's proposition is quite a bit more heavy handed than Boost's.
This somehow reminds me of the argument around git-flow. It's a decent reasoning, however, the whole idea is based on having a single, bleeding edge version. Basically a SaaS.

A lot of companies/products do not work that way. Some have physical products out there that have to be updated, some have on-premises deployments, some sell user software of which there are multiple versions under support. Each of these live versions have to have their own source branches and dependency trees. A single `:latest` can render future bugfixes unbuildable.

Doesn't that require you have a Google sized army of engineers and tooling to stay in sync?
I do it pretty much on my own for https://ossia.io ; it takes me roughly a day every other month to update my mac, linux and windows SDKs to the new LLVM / Qt / FFMPEG / {... other large dependency I use ...}.

Definitely not the end of the world. Said SDKs & build scripts are available here: https://github.com/ossia/sdk for anyone interested (I'll be honest though: the scripts are a mess!)

1 day for... ~500 kLOC, it seems. Which is pretty good I think, but larger codebases would have a lot more to deal with -- not just more dependencies, but also more breakages per dependency.
Or you just checkout the version that does what you need and leave it alone.
google3 has a one version rule. You can’t just check out the version of a dependency that does what you want.
That works for google web where they can upgrade everything at will. When you work in embedded upgrades can cost millions of dollars since someone needs to go to remote locations to run the upgrade.
Who is "everyone" here? It sounds like "everyone" is just the developers.
is "living at the head" constantly pulling each update or version-controlled source of every library?
I'm not sure to understand: What "living at the head" means?
There are various ways to implement this, but to simplify the explanation, assume that versioning always works via lockfiles [1]. Lockfiles record all of the versions of all of the projects that contributed to some experiment or release. They're common in various language-specific ecosystems like gem, pip, cargo, and so on.

Assuming you have these lockfiles, you have the typical option, which would be to make a lockfile for each released entity, record it for later reproduction, and update it at least every release.

The "live at head" approach would be to instead have a main shared lockfile for every project in the company in a recorded sequence. All projects pick a version of that lockfile to release from. Practically speaking, all projects probably just take the latest version (the head) of that lockfile and everyone works hard to make sure that lockfile always works for everyone.

The main advantage here is pretty straightforward combinatorial math. Maintaining and validating unique combinations of dependencies for every release in a codebase is NP-silly, whereas sharing one set of dependencies across as many applications as possible isn't easy but it has a much nicer cost curve. In theory at least, but a lot of large organizations claim practice backs the theory up as well.

[1] Versioning doesn't have to work this way. Putting all code into one big source repository (vendoring) has the same effect.

Does that include the head of GCC/LLVM?
It does. Google mirrors every commit to LLVM in their monorepo, builds and tests the whole monorepo with a fresh Clang nightly, and (ideally) one of those Clang nightlies is released as the new stable compiler for all users of the monorepo every week. This helps keep Google at HEAD and helps keeps LLVM upstream stable.
I thought llvm lived in /third_party, and internal projects usually target specific LLVM versions, not the current open source trunk.
As far as I know, there was only ever one version of LLVM at a time in the monorepo. It's possible things changed after I left (either the compiler team or Google).

Each upstream commit didn't land directly into the monorepo, instead there was a long lived branch, and on the compiler team there was a buildcop rotation responsible for doing an integrate from that branch into //third_party/llvm. This included running the tests (and fixing any problems) for any other software that depends on LLVM as well as building an unstable crosstool and doing some basic smoke tests on that. Taking that crosstool through testing and to stable crosstool was the responsibility of a different buildcop rotation, using a special compiler team tool for testing the testing crosstool nightly, then to release to stable we used the ordinary presubmits, but for all projects at once, making its testing as similar as possible to any normal code change.

How does Google deal with other projects' embedded code copies? ie some other project embeds a random old snapshot of LLVM, when you import that project into the monorepo do you end up with two copies of LLVM, or do you strip the old copy and port the codebase to the current LLVM in the monorepo?
"Head" at Google means "head of the google3 monorepo". I gather they stay close to upstream for GCC and LLVM. Alsonote that they aren't necessarily matching upstream offerings with respect to C++ standards flags. They aren't necessarily matching upstream for various third-party libraries either. "Head" for third-party libraries means "the latest vendored in from upstream", which doesn't have to be particularly new.

In contrast, those third-party projects like boost would probably consider the latest commit on their HEADs of their git repos to be "head". So "live at head" is a statement about how each organization should version its dependencies. It doesn't really make sense in the context of boost maintainers deciding their support surfaces since they have all the "heads" to worry about, inherently -- all the organizations using boost libraries.