Hacker News new | ask | show | jobs
by ori_b 2620 days ago
> For multirepo users this is explicit in that this comes for free :-)

Only if you spend the time to build tools to detect commits in your dependencies, as well as your dependent repositories, and figure out how to update and check them out on the appropriate builds.

So, no, it doesn't come for free.

3 comments

sorry, "this" is rather ambiguous.

You are totally correct that to achieve the same performance, correctness, and overall master level "green"ness in a multirepo system you would have to either define or detect dependencies and dependent repos, build the entire affected chain, and test the result. That part is much easier in monorepo.

What I was referring to with "this" is that Uber's method of detecting potential conflicts. In multirepo land it would be a "conflict" if two people commit to the same repo. In multirepo, therefore, detecting potential conflict is trivial.

If Bob commits to repo A and Sally commits to repo B, their commits can't result in a merge conflict. Well, unless the repos are circularly dependent - which would be bad :-) don't do that. Of course, monorepo makes that situation impossible so there's an advantage for monorepo.

It seems like whether you have mono- or multi- the problems solved by one choice will leave other problems the build system has to solve that it wouldn't have to solve if the other option were chosen.

Different work would be required in multirepo but it would be work to solve the problems that monorepo solves just by virtue of it being a monorepo.

> You are totally correct that to achieve the same performance, correctness, and overall master level "green"ness in a multirepo system you would have to either define or detect dependencies and dependent repos, build the entire affected chain, and test the result. That part is much easier in monorepo.

You also would need to do that as an atomic operation (in the mono-repo + especially with a commit queue you're building on the atomicity of git).

Having to unwinding that transaction if you aren't atomic can get you into a big mess at larger scales.

Here's a good related talk: C++ as a "Live at Head" language: https://www.youtube.com/watch?v=tISy7EJQPzI by Titus Winters (from Google).

thanks for the link - looking forward to watching.

Currently, I'm not convinced that you need to track and apply commits across dependencies and dependent systems atomically/transactionally to have a sane build environment even at scale, but you definitely get that part free with monorepo.

Any links to docs or presentations that address that specific issue would be very welcome :-)

> If Bob commits to repo A and Sally commits to repo B, their commits can't result in a merge conflict.

This holds true when A and B are leaf repos, but gets tricky with repos inside a dependency graph. More concretely, if C depends on both A and B, and it turns out that C depends on A and B in such a way that A_bob and B_sally are mutually incompatible, you need some kind of mechanism for reconciling that.

Of course, exactly as you point out, mono and multi are two tradeoffs for the problems that large codebases intrinsically are.

Package managers solve it quite well. Just depend on the latest version of your dependencies and tag a new version whenever they change.
This doesn’t work when an underlying system changes, and upgrading is mandatory for all clients or package dependants (happens often at scale for a multitude of reasons).
That's not good stewardship. You have a better API? Great, convince us it's worth investing in soon, you can even deprecate the known-good version.

There's always a window where both will be in use, because we can't synchronously replace every running process everywhere (not that it's even a good idea without a canary). The shorter you try to make that window, the more needless pain is created and plans disrupted. While we could use prod to beta test every single version of everything, that shouldn't be our priority.

That's not reality for most large companies, even though it's the right mindset for most software libraries. Ex: A new legal requirement comes in, resulting in a mandate that fields X and Y for certain queries, that are being done all over the codebase, now have to be tokenized. This is a breaking and mandatory change, with no room to allow systems to stay behind, and expensive consequences.

In this case you'll have a very short transition, between all consumers updating their client code (possibly in a backwards compatible way) and the change in the implicated system being deployed, not the other way around.

If adopting the new API is mandatory, every team should be told why it's mandatory, and we'll reprioritize and get it done. Doing it to our code behind our back is passive-aggressive and likely to break stuff, because who is monitoring and reporting on a change in our system's behavior that we didn't even see?
This is essentially the problem go has had, precisely because you could only ever pull the latest from master.

Introduce a breaking change into a common library and now you have to update every other dependency to support it.

Not so bad in a monorepo. But when your codebases are distributed?

It's almost like semantic versioning exists for some reason
Or realize that it is an advantage to control your dependencies.
Now you have to desperately try to get people to upgrade every time an important change goes through. And you quickly live in a world where you need to maintain tons of versions of all your services.
Not exactly for free, but there are free tools that handle this job for you very nicely:

https://zuul-ci.org/