Hacker News new | ask | show | jobs
by jnurmine 2520 days ago
Is the monorepo/multirepo choice really the most important thing to consider?

Branching: monorepo or not, if a feature-incomplete development branch for one of the supported targets can "hold the entire organization as a hostage" then the SCM people, and/or persons responsible of the SCM policy, should do some introspection...

Why are deliveries done from a branch which is obviously still in development? Why does code-to-be-released need to depend on incomplete work? Why aren't something like "topic branches" used?

Modularity: monorepo or not, problems will certainly appear when the complexity of implementation outpaces the capacity created by the design. To get modularity, one needs actual modules with properly designed (=not brittle, DRY, KISS, YAGNI, SOLID, etc. etc.) interfaces between the modules. Now, does monorepo/multirepo really play a role here at all? If everyday changes are constantly modifying the module interfaces in incompatible ways which breaks existing code, this speaks something about the design, or rather the insufficiency of it.

Of course, every project and team is different. However, even if a locally optimal choice for the monorepo vs. multirepo question is found, problems existing regardless of monorepo/multirepo will still be there.

4 comments

I was wondering the same thing. Like, I use a microservice architecture at work, but the choice of using one vs. many git repos to represent diffs in those services over time seems largely meaningless. I don't like large diffs or people breaking production, but this is solved by testing and insisting upon small diffs, not by how many git repos we use.

Formally speaking, multi-repo management allows a strict subset of the diffs allowed to a mono-repo (because diffs can 't extend beyond each repo root). Are the excluded possibilities all bad? No. Are they generally bad? Not really. Are they sometimes bad? Sure. Are they sometimes better than many diffs across many repos? Sure. Can a reasonably competent dev team tell the difference? Sure, usually. Unsurprisingly, this usually requires the exact same tooling as ensuring the quality of microrepo changes.

If you're continuously deploying master, have a healthy ci/cd pipeline, and enforce good merging discipline, you're fine either way.

I'm a little tired of doing things like revving our trace and logging libraries across our 50+ micro repos that represent microservices. That's genuinely obnoxious. Is it bad? No. Is it obviously more or less error prone than the equivalent monorepo update? No. All the bad bits of either strategy just require some tooling and a clear head.

So far I've only managed to find one thing that monorepo fundamentally offers that micro does not: atomic commits across projects.

But I'm not sure that's a useful feature anyway:

1) If you are doing a whole-repo refactor (one of the main atomic-commit benefits I see claimed), you still have to run on X -> try to commit X+1. If someone committed in between you may have to redo the whole thing. Or lock the whole monorepo while doing so. Both scenarios seem worse to me for mono, since microrepos stand far less of a chance of conflicting (less frequent commits, less code to consider (faster refactoring tool runs), etc) and a lock would be a far smaller interruption (one repo vs the whole company).

2) Atomic commits don't represent how things are deployed. You still have to deal with version N and N-1 simultaneously. So e.g. breaking refactors of RPC APIs have exactly the same problems in mono vs micro.

On the other hand, downsides are pretty clear and take immense work to sidestep: most tools will either be much slower or not work at all, because they now need to work on 100s or 1000s of times more data than they were developed against. That's probably thousands of man-years of tooling you may have to understand and improve, or wholly replace.

---

The vast majority of monorepo benefits that I usually see claimed are actually tool-standardization benefits. Or "we could build tool X to do that". Or top-level control, like "we can commit for team X". Of course that's useful! But it has nothing to do with monorepo vs microrepo.

Monorepo just happens to be the carrot/stick used to finally achieve standardization. Others could work, this is just the current fad (which, in some ways, is why it sometimes works - it's easier to convince others).

As the project scope and customer base continues to grow, the likelihood that you picked all the correct boundaries within the system drops to zero.

When the right boundaries reveal themselves, you can divide the code up. But who is to say those will still be the right ones in ten years?

If you divide the source code into separate repositories before getting the boundaries right, there's a tremendous amount of friction built into the system preventing the problem from being addressed. Each repository has its own actors, cycles, and version control history, and you break two of those when you start trying to move code across project boundaries. So people just hit things with a hammer or steal functionality (three modules with a function that ostensibly does the same thing but with different bugs).

One of the things I see over and over again is people conflating one repository with one lifecycle. One binary. It's possible to have a monorepo with multiple build artifacts. The first monorepo I ever worked on had 60 build artifacts, and it worked pretty well (the separate artifacts weeded out a lot of circular dependencies).

I can still get inter-version dependency sanity checks with a monorepo. When I am writing new code I can have everything talk to localhost (master@head) or I can have it talk to a shared dev cluster (last labelled version) or some of both, allowing me to test that I haven't created a situation where I can't deploy until I've already deployed.

I completely agree with your comment.

Monorepos will not save a company from their lack of discipline. But while you can have problems if you do stupid things in a monorepo, you will always have to deal with the dependency hell and what come with it on multirepos.