Hacker News new | ask | show | jobs
by edejong 2884 days ago
The key here is reverse dependency management. “If I change X, what would influence this change?”.

This can be achieved with single repo better than multi-repo due to the completeness of the (dependency) graph.

1 comments

Exactly this. Or at least it's a way this can be achieved, assuming solid testing & some tooling in the mix.

For folks unfamiliar with it, the issue is something like:

1. You find a bug in a library A.

2. Libraries B, C and D depend on A.

3. B, C and D in turn are used by various applications.

How do you fix a bug in A? Well, "normal" workflow would be something like: fix the bug in A, submit a PR, wait for a CI build, get the PR signed off, merge, wait for another CI build, cut a release of A. Bump versions in B, C and D, submit PRs, get them signed off, CI builds, cut a release of each. Now find all users of B, C and D, submit PRs, get them signed off, CI builds, cut more releases ...

Now imagine the same problem where dependency chains are a lot more than three levels deep. Then throw in a rat's nest of interdependencies so it's not some nice clean tree but some sprawling graph. Hundreds/thousands of repos owned by dozens/hundreds of teams.

See where this is going? A small change can take hours and hours just to make a fix. Remember this pain applies to every change you might need to make in any shared dependency. Bug fixes become a headache. Large-scale refactors are right out. Every project pays for earlier bad decisions. And all this ignores version incompatibilities because folks don't stay on the latest & greatest versions of things. Productivity grinds to a halt.

It's easy to think "oh, well that's just bad engineering", but there's more to it than that I think. It seems like most companies die young/small/simple & existing dependency management tooling doesn't really lend itself well to fast-paced internal change at scale.

So having run into this problem, folks like Google, Twitter, etc. use monorepos to help address some of this. Folks like Netflix stuck it out with the multi-repo thing, but lean on tooling [0] to automate some of the version bumping silliness. I think most companies that hit this problem just give up on sharing any meaningful amount of code & build silos at the organizational/process level. Each approach has its own pros & cons.

Again, it's easy to underestimate the pain when the company is young & able to move quickly. Once upon a time I was on the other side of this argument, arguing against a monorepo -- but now here I am effectively arguing the opposition's point. :)

[0] https://github.com/nebula-plugins/gradle-dependency-lock-plu...

> So having run into this problem, folks like Google, Twitter, etc. use monorepos to help address some of this.

I think you’re retroactively claiming that Google actively anticipated this in their choice at the beginning of using Perforce as an SCM. They may believe that it’s still the best option for them, but as I understand it, to make it work they bought a license to the Perforce source code forked it and practically rewrote it to work.

Here’s a tech talk Linus gave at Google in 2007: https://youtu.be/4XpnKHJAok8

My theory (I wonder if someone can confirm this), is that Google was under pressure at that point with team size and Perforce’s limitations. Git would have been an entirely different direction had they chosen to ditch p4 and instead use git. What would have happened in the Git space earlier if that had happened? Fun to think about... but maybe Go would have had a package manager earlier ;)

> I think you’re retroactively claiming that Google actively anticipated this in their choice at the beginning of using Perforce as an SCM.

Oh I didn't mean to imply exactly that, but really good point. I just meant that it seems like folks don't typically _anticipate_ these issues so much as they're forced into it by ossifying velocity in the face of sudden wild success. I know at least a few examples of this happening -- but you're right, those folks were using Git.

In Google's case, maybe it's simply that their centralized VCS led them down a certain path, their tooling grew out of that & they came to realize some of the benefits along the way. I'd be interested to know too. :)

Maybe Google’s choice for monorepo was pure chance. However, on many occasions the choice was challenged and these kinds of arguments were (successfully) made in order for it to stay.
There's a subtler, and potentially more important thing that can crop up with your scenario:

Library A realises that its interface could be improved, but it would not be backwards incompatible. In the best case scenario, with semver, there is a cost to this change. Users have to bump versions and rewrite code, maybe the maintainer of Library A has to keep 2 versions of a function to ease the pain for users. It may just be that B, C and D trust A less because the interface keeps changing. All this can mean an unconscious pressure to not change and improve interfaces, and adds pain when they do.

Doing it in a monorepo can mean that the developers of A can just go around and fix all the calls if they want to make the change, allowing for greater freedom to fix issues with interfaces between modules. And that is really important in large complex systems with interdependent pieces.