Hacker News new | ask | show | jobs
by gresrun 1217 days ago
Context: Staff Eng @ Google for 7+ years

1) This is solved by 2 interlocking concepts: comprehensive tests & pre-submit checks of those tests. Upgrading a version shouldn’t break anything because any breaking changes should be dealt with in the same change as the version bump.

2) Google’s monorepo allows for visibility restrictions and publicly-visible build targets are not common & reserved for truly public interfaces & packages.

3) “Code churn” is a very uncharitable description of day-to-day maintenance of an active codebase.

Google has invested heavily in infrastructural systems to facilitate the maintenance and execution of tests & code at scale. Monorepos are an organizational design choice which may not work for other teams. It does work at Google.

6 comments

> any breaking changes should be dealt with in the same change as the version bump

Does this mean that some things will never get updated, as the effort required is impossibly high?

Effort to update something is high because there's a lot of code, not because it's in a monorepo. Updating the same code scattered across multiple repositories takes as much work in the best case. More realistically, some copy of the same code will stay unupdated because the cost to track down every repository in the company is too much.
Can definitely feel this pain personally. Need to upgrade tooling across some dozen or so services and we're investigating how to migrate with potentially incompatible upgrades. So just suffer outage while we merge PRs across some 20 repos? The atomic changes of a monorepo are very beneficial in these cases, removing the manual orchestration of GitOps practices segmented across individual services..
When you say it's "as much work" there's an assumption the code is still used. This was years ago, but when I was doing migrations at Google we sometimes had to deal with abandoned or understaffed and barely maintained code. (Sometimes by deleting it, but it can be unclear whether code by some other team is still useful.)

If you're not responsible for fixing downstream dependencies then you don't need to spend any time figuring that out.

Sounds great to me because you are forced to delete code that's not in use anymore. Without the monorepo, that code would still be there with old libraries that are potentially insecure.

Deleting code that is not being used anymore happens way too rarely in my opinion.

The downside is if a product no longer have maintainers you are now encouraged to shut it down, even if it still works and it doesn't cost much to run.
If a product non longer has maintainers, it's probably because it's not worth it for the company. So it makes sense to delete it, from the company point of view.
In that case the product should still have maintainers. Even if only part-time, no software project should be completely unsupervised.
But with a multi-repo, its possible to e.g. upgrade the dependency just for a single service that has an immediate need for the upgrade, isn't it?
The flip side is that services with an immediate need will get upgraded, and others won't, and six months later you will be saying "Why am I still seeing this bug in production, I already fixed it three times!"

Of course, the problem can be mitigated by a disciplined team that understands the importance of everybody being on the same page on which version of each library one should use. On the other hand, such a team will probably have little problem using monorepo in the first place.

Whether you have a monorepo or multiple repos, a good team will make it work, and a bad team will suck at it. But multiple repos do provide more ropes for inexperienced devs to tie themselves up, in my opinion.

I don't think that's quite true. In my experience multi-repos have the edge here.

If you have one key dependency update with a feature you need, but you need substantial code updates and 80 services depend on it, that may be impossible to pull off no matter what. Comparatively, upgrading one by one may not be easy, but at least its possible.

The importance of everyone being on the same page with dependencies might just be a limitation of monorepos rather than a generally good thing. Some services might just not need the upgrade right now. Others may be getting deprecated soon, etc.

There are languages / runtimes where there could not be two different versions of the same thing in one binary (and they eagerly fail at build time / immediately crash upon run). That is not the case for JavaScript, Rust, etc. But it is the case for C++, Java, Go, Python and more.

Everyone claims different needs if they can. Nothing could be linked together anymore if you just let everyone use whatever they want.

Or maybe people start to try to workaround this by ... reinventing the wheels (and effectively forks and vendoring) to reduce their dependency graph.

There is a genuine need for single instance of every third party dependencies. It is not unique to monorepos. Monorepo (with corresponding batch change tooling) just make this feasible, so you don't hear about this concept for manyrepos, and mentally bind it to monorepo.

Lots of code under the assumption that all the code needs to use the same version*

A big bang always sucks versus some migration over time

No you use automated systems to do the change. https://mobile.twitter.com/obeattie/status/10804969557537505...
Even so, the cost is often outrageously high.

No to mention that if you're the first team to import a third_party library, you own it and other teams can add arbitrary cost to you updating it. You have to be very aggressive with visibility and SLAs to work around this.

You're basically just describing all the pain with pulling in a dependency regardless of monorepo or not.

If the third party dependency does not add enough value to justify the cost then don't add it.

In a multi-repo setup you can upgrade gradually though, tackling the services that need the upgrade the most first. Can you do that in a monorepo setup?
In a multi-repo setup you can upgrade gradually..

This also means services can be left to rot for years because they don't need to be upgraded, while all the infrastructure changes around them, which is a giant pain when you do eventually need to change something.

If you have a multi repo architecture you absolutely need both clear ownership of everything and well planned maintenance.

The total pain is the same though.
With multirepo setups, you don't necessarily need to update the package for all code at all.

Instead, some newer package completely replaces an old one, with no relation to the old dependency package, or with a dependency on some future one, and both can run at the same time while turning the old one off

Almost. We had a UI library on Android that was stuck on an alpha version of the library for three or so years after the library had shipped.

Upgrading the library broke many tests across the org, and no one wanted to own going in and getting each team to fix it. Eventually, the library had a v2 release, and people started to care about being able to use it.

Ultimately, they just forked the current release and appended a v2 to the package name.

Not the norm, but it happens. The monorepo works for Google, but I wouldn't recommend it for most organizations; we have a ton of custom tooling and headcount to keep things running smoothly.

From the mobile side, it makes it super easy for us to share code across the 50+ apps we have, manage vulnerabilities quicker, and collaborate easily across teams.

> The monorepo works for Google

Does it? Or is it stopping Google from supporting products which only make millions in revenue because of the massive burden of continually updating?

Oh geez, that's an entirely different can of worms that isn't related to the monorepo.

Most products at Google are not dropped because the monorepo makes it difficult for them to support - and I'm not sure how it would or how you got to that association. Also, plenty of products that are killed are not in the monorepo.

They are usually dropped due to a mix of things, but a big part is just better product management.

Better project management as in, somebody politicked their way into owning a replacement for a currently running thing?

The implemented product, as well as the vision for something like inbox or Google music is still way better than Gmail and YouTube music as the end user

Fixed that:

>They are usually dropped due to a mix of things, but a big part is just worse product management.

Google is now at the point where their new projects fail (like Stadia) because they killed old products. Killing products has second-order effects.

> Ultimately, they just forked the current release and appended a v2 to the package name.

Hmm, does that explain the golang module versioning requirement where v2 must have a different name?

Yes.
Google's software mostly uses dependencies already in the google monorepo, so these issues don't crop up. The person/team working on library changes have to ensure that nothing breaks, or the downstream users are notified early on. Don't think this would apply to many companies.
That sounds like a huge amount of effort unrelated to your current project, both for those being forced to upgrade, and those organizing the upgrade
It’s not really even a true monorepo. Little known feature - there is a versions map which pins major components like base or cfs. This breaks monorepo abstraction and makes full repo changes difficult, but keeps devs of individual components sane.
This was done away with years ago. Components are no more.

There are still a couple of things that develop on long lived dev branches instead of directly at head, but my personal opinion is the need for those things to do that is mostly overstated (and having sent them cls in the past, it's deeply annoying).

>> 3. It encourages a ton of code churn with very low signal.

> 3) “Code churn” is a very uncharitable description of day-to-day maintenance of an active codebase.

Also implicit in the discussion is the fact that Google and other big tech companies performance review based on "impact" rather than arbitrary metrics like "number of PRs/LOCs per month". This provides a check on spending too much engineer time on maintenance PRs, since they have no (or very little) impact on your performance rating.

> based on "impact" rather than arbitrary metrics

Umm, from whatever I have seen in big tech "impact" is also fairly arbitrary. It all is based on how cozy one is with one's manager, skip manager, and so on. More accurate is "perception of impact".

Especially as it gets more and more nebulous at higher levels.

How do you deal with wanting to see the history, graph etc of just one sub-project? Does the tooling handle this?
I believe everything is tracked at the folder/file level and not a project level. I'm not sure there even is a concept of a project. But maybe someone can correct me.
There is a concept of a project. Though viewing change history is more organized around packages and files.
History for folders is visible in code search, it’s basically equivalent to what GitHub or Sourcegraph would give you. You can query dependencies from the build system. Anything beyond a couple levels deep is unlikely to load in any tools you have ;)
git log <directory> accomplishes this already.
Google uses piper and perforce (well, g4) before that
Is monorepo an important reason for Google to kill products? Or is it just my imagination?
Hi, unrelated to this, but since you are working at Google, were there actually "code red" meetings at Google concerning chatgpt?