Hacker News new | ask | show | jobs
by captainmuon 1538 days ago
One upside of smaller repos that I rarely hear about is that it forces you to think about versioning. If you have a monorepo, you often don't version individual components, you just have master that always builds. If your product is a user facing website, that is fine. But if you make releases, and have multiple components in different versions that have a stable API, and are expected to work in different combinations, then it is a real hassle. Of course you can tag individual library versions in a monorepo, but that is not the way of least resistance.

One place I've worked at migrated to a monorepo, the ATLAS experiment at CERN. It was not bad, although there were the usual problems with long checkout time. But it worked because we tended to version every single piece of software together in a big "release" anyway (to make scientific results reproducable).

4 comments

This almost feels like a version of Conway's Law: you inevitably ship the org structure.

Your dev tooling also influences the shape of the thing that you write. If you have a monorepo then it encourages you to ship a monolith that freely interoperates with itself. If you have multiple repoes that need to be versioned against each other, you will ship components with more stable APIs.

So this means that if you ship a product within which customers are free to update portions of them at will, then using a monorepo will make things more difficult than necessary.

And if you ship a single unversioned monolith to the world, then using multiple repoes adds unnecessary friction to working within the company.

Google, at one point, had component versioning that was not just "build everything from the latest commit". Libraries within the tree would get tagged releases, and everything else would build from the latest tag of those libraries.

This practice was abandoned, but I don't know the reasoning for why it was abandoned.

People hated that they couldn't make atomic changes across components. Google's monorepo means everybody has to move in lock-step, which is bad for everybody:

* library maintainers must make sure they don't introduce any regressions to any users at all. There's no major version number that you can increment to let people know that something has changed. Development necessarily slows.

* Library users must deal with any breakage in any library they use. Breakage can happen at any time because everybody effectively builds from HEAD. There are complicated systems in place for tracking ranges of bad CL numbers

Monorepo isn't entirely to blame for this, but it certainly doesn't help. I've been at Google 15 years and I'm tired of this.

Question: Doesn't the same thing apply to managed services?

Let's say you want to make a change to the filesystem. You can change the client libraries today, but old client libraries are going to be in production for weeks, or longer. Your filesystem service has to be backwards compatible with some weeks or months of filesystem libraries.

Yes, and this is a good thing! If you're building infrastructure that people are going to come to rely on, it shouldn't change very much. Its interface certainly shouldn't change without very good reason.

Another problem Google has is that people feel the need to change things in order to get promoted.

Regarding tracking bad CL ranges: Ecosystems (outside Google) which use versioned packages have the same requirement. If some version of a package you depend on has a bug then you might detect it yourself if you're lucky but more likely you won't detect it, so you need to use tools to centrally track known-bad versions and check whether your systems are affected. Package repositories support removing versions that are known to be bad for the same reason. Most of the attention in these areas is on security related bugs right now, but that's really just a sub-category of the overall problem.

I don't think the bad-versions tracking outside Google is any less complicated than the bad-CL-ranges tracking inside Google.

Your frustration has already been addressed in danluu article under the header "Cross-project changes"

I believe he said that he wrote this article to avoid repeating the same convo again and again....

Right, smaller repos add more friction to dependencies, that is certain, but flipside view of that is that it enforces API boundaries and thinking about systems building as SOLID components in their own right.

That friction sometimes helps: If it is painful to update Dependency A because it usually means upstreaming changes to A's Dependency B first, for instance, that can often indicate a tight-coupling problem that in a class diagram someone might easily discover and refactor over lunch but in a systems diagram was non-obvious without that "update hell" pain. Solving such tight-coupling problems is hard, and it may mean living with the pain for some time, and while monorepos make that pain go away they never solve those coupling problems (and arguably make it far easier to strongly couple systems that you likely don't want coupled). It's a lot like turning off all the Warnings in your compiler; it makes the immediate dev experience a lot nicer, but it risks missing things that while not problems now may be problems in the future.

I think there are also some benefits to using the same dependency managers for first-party components/libraries as for third-party components. The auto-updating of first-party versions is seen as a benefit to monorepos, but if recent and current CVEs have taught us anything you need to audit and update your third-party components quite regularly. Needing to also update first-party components/libraries with the same dependency managers has some benefits in terms of forcing a regular dependency update cadence, that then also benefits additional developer eyes on third party update rhythms. (Especially as increasingly more dependency managers pick up auto-auditing/security and CVE awareness tooling that runs on each update. There's more likely developer eyeballs on those audit reports if frequently run for first-party components and third-party components.) Dependency managers are their own friction in the process, but necessary friction for third-party components, and there are benefits to first-party components needing the same friction.

As with most software development practices there is no objectively "right" answer here. Monorepos have less friction in a large org. Friction and pain are sometimes useful tools, despite few people "want" them in their developer experience. Systems design is hard and tight-coupling is often an easy solution. Looser coupling is often better, more resilient design that is easier to work with at the boundaries and the "I can trust this other team's repo to be a black box and they let me file bug reports as if they were a second-party vendor" level, which can be its own tool for avoidance of mental fatigue.

My personal experience in large orgs is that friction is a much larger problem at larger orgs than it is at smaller orgs. The friction was always much lower, day-to-day, at small orgs. (Small orgs front-load the friction somewhat... "Here, set up your development environment.")
That's a fair way to view it, and my experience is also that a lot of that friction at its worst tends to accrete around bureaucratic barriers and fiefdoms. That also plays out in its own ways to the compiler warnings analogy: if there's a lot of friction touching a particular code for bureaucratic reasons, often the bureaucracy doesn't go away in the monorepo case it just disappears until it painfully shows back up later in the process. For instance, multi-repo may add a lot of friction to even finding/getting access to the repo in the first place but once you have access after bureaucratic red tape, PRs may be painless. Yet there are certainly all sorts of monorepo horror stories of making an "easy" PR and then finding that PR get bogged down in a lot of politics as bureaucrats crawl out of the woodwork from PR pings (sometimes pings they themselves set and the PR creator isn't ever aware of until the PR is sent). The bureaucracy is much the same in both cases, the pain is very similar, in one case it is just front loaded and obvious. (Everyone knows "Oh, Bob owns that repo. You need to fill out these forms, take it to the castle next door, and look for the ogre to give the forms to. That's Bob. Then you can make PRs to your hearts content if Bob likes you and doesn't eat you." versus a troll jumping out from under a bridge to completing a PR that you never expected and demanding a sacrifice of some goats before you may cross the bridge.)
yea that friction is good. i wrote some code. someone liked it and added a dependency of their app on my app. i needed to update my code - all the sudden i was responsible for updating some other random app and ensuring it kept working - behavior we considered a bug and they didnt, so both code versions needed to work. the monorepo let them make an api where one was never intended to exist
Whether that's an upside depends. Mostly I think it's a downside.