Hacker News new | ask | show | jobs
by DanFeldman 2521 days ago
My team just switched to a monorepo. It's been only a few weeks, so I can't claim any results yet, but we've lived w/ the pain of poly-repo for long enough that we were ready to invest in a single repo.

We've spent a lot of time building and iterating a unified ci/cd environment to support the new repo. Previously each project had it's own test/deploy/build/publish story and usually it's own jenkins project. Now, each project is registered and triggers its own steps. Cross-project edits can happen in a single pull request. We have an incredible amount of integration tests (more so than unit tests), and getting them to work corss-project while migrating has been challenging.

We've gone from ~10-15 actively maintained repos to about 3 as we're slowly migrating. We have a mix of services, libraries, and batch processing all mixed in.

The authors points about forking and long-lived branching being incredibly difficult for most teams is really crucial. We're going to have to invest in education for new members about WHY we have a monorepo, what it means for your development, and how to change your perspective for developing at HEAD. I don't think 'bad' developers make it easier or harder. Instead, clearly articulating behaviors that exist in a poly-repo vs mono-repo world to developers is the Differentiator.

These articles were absolutely crucial to developing our monorepo.

https://trunkbaseddevelopment.com/

http://blog.shippable.com/ci/cd-of-microservices-using-mono-...

https://www.godaddy.com/engineering/2018/06/05/cicd-best-pra...

3 comments

> My team just switched to a monorepo.

I feel like this discussion is missing an appreciation for size/scope of repositories vs. size/scope of the organisation developing that software, with a pinch of appreciation for Conway's law.

If your team is a typical team of at most, say, 30 people, then maintaining 15 different repositories is clearly insane, but merging them into a single one likely doesn't truly deserve the moniker "monorepo", because it's just not that large (and varied in scope and purpose) of a project at the end of the day.

Think of it this way: the Linux kernel is certainly a larger project, but nobody thinks of it as a monorepo. Same thing goes for major software projects like Qt.

How do you handle building changes to just one of those projects? Can Jenkins do that (easily)?

I think that's the big thing that always puts me off monorepo... We'd basically be going from ten 5 minute builds to one 50 minute build if it wasn't possible to do incremental builds. IIRC Google and MS have purpose built tools that do impact detection to work out what to build for their monorepos to keep build times down.

If you're doing a monorepo I think it's strongly implied that you'll also use a build system (Blaze/Brazil/BuildXL etc) that has granular compilation units and output caching so build time doesn't scale linearly with the company's total codebase.

It's definitely important to consider before jumping in. Going from 5m to 50m compile times would be a major issue for me.

Err what is even the alternative? Even a makefile would provide incremental builds.
A good makefile provides incremental builds in every possible scenario of source file changes. It's very easy to write bad makefiles that don't catch modified or removed header files, or even worse when code generation or other complicated build logic is involved. Because a lot of developers seem to think "random build fail? oh just do make clean" is an acceptable workaround.
Good to know, thanks. Something I'll have to do a bit more reading up on.
It was a bit hacky, but we've basically implemented some of the stuff in [1] to achieve incremental builds. If a pr changes projects a,b,c and not x,y,z then it will only build a,b,c. But it's not truly incremental right now, as it won't test things that depend on A/B/C.

We have plans to use Bazel in the future, but you have to boil the ocean when moving to bazel and get everything ever inside bazel before you get any benefit out of it.

Jenkins can't do it "easily" but it definitely can. I'd be happy to share our Jenkinsfile if you'd like.

Our finding of changes is something like:

#!/bin/bash set -euxo pipefail

COMPARE_BRANCH=$1

MERGE_BASE=`git merge-base $COMPARE_BRANCH HEAD` FILES_CHANGED=$(git diff --name-only $MERGE_BASE | grep '/') echo ${FILES_CHANGED} | xargs dirname | cut -d "/" -f 1 | sort | uniq

[1] blog.shippable.com/ci/cd-of-microservices-using-mono-repos

If you have to do make clean in a monorepo you are pretty much toast. Tooling for impact detection and reliable makefiles that always succeed incremental builds is absolutely crucial.

In a way this is one of the hallmarks of a monorepo - Interfaces and dependencies changing so quickly it becomes too troublesome for humans to categorize (and re-categorize) them into repositories, so you let a machine (makefiles) do the work instead. And even without a monorepo you still have the same problem, eventually you will have to integrate all your mini repos into one final product, which you want to have tested. This is something you want to do as frequently as possible, ideally on every commit, not by doing major version-steps of sub-projects.

I suspect for that number of projects monorepos make a lot of sense.

The major technology organizations we hear about usually have at least several monorepos, due to the legacies of acquisitions and mergers if nothing else.

At the scale of thousands of subprojects, I am not entirely sure the benefits are as advertised. There will be support of subprojects forked to public github.com or gitlab.com if nothing else. And there will be external dependencies to manage; system level libraries like openssl and libc if nothing else. Even if they are vendored in to the monorepo, any upstream regression is a significant problem in a monorepo... and the problem sometimes has to be solved in a big bang instead of incrementally.

Going from 15 to 3 is definitely a different discussion from going from 7000 to < 5.

At 15, it feels like it's kind of just a toss up. We have several thousand repos, and sometimes we see 5-10 of them that really should be grouped, and we do so. Sometimes we see 1 repo that has 5-10 projects in it, and we break them down. Whatever works.

But when the entire org is on a dozen project you're potentially in the worse of both worlds. Your repos aren't small enough or aligned with team ownership enough to really benefit from it. So its straight overheard.

FWIW at least 95% (anecdotally) of Facebook’s main code is in two gigantic monorepos: fbsource and www. (The other major repos are for configuration-related stuff).

Last I heard there were plans to move www into fbsource.

There are certainly not random dependencies on public GitHub pages. Everything is versioned.

There is a mind boggling amount of custom tooling to make this work.

The versioned external dependencies work for systems that support semantic versioning. Some dynamic languages. Some C. Definitely nothing with a non-C ABI.

But "not working" looks like fixing an unknown number of bugs across the various subrepos. Because permanently forking upstream it never applying security patches isn't a good business model.