Hacker News new | ask | show | jobs
by lisper 1538 days ago
It's very simple: with a monorepo you always have access to everything you need, together with a ton of stuff you don't. Whether or not this is advantageous boils down to whether the cost of not having access to something you need is greater than the cost of having access to a bunch of stuff you don't. As long as your system is reasonably efficient at letting you select small subsets of everything you could potentially have access to, the cost of having access to a bunch of stuff you don't need is essentially zero. Perforce is good at that. Git isn't. So people who use Perforce tend to think that monorepos are good and people who use git don't. And they're both right.
6 comments

I don't think version control system is the major differentiator for whether people like monorepos or not, tbh. Having a good incremental build/test system is far more important to developer experience, IMHO.

The biggest dissonance when it comes to the purported benefits of monorepos is that a "good" monorepo generally assumes very good interface design skills across all teams, but in reality, the path of least resistance is tacking on more and more unique codepaths (e.g. forking/"rewriting" existing things), so in effect, likability often comes down to how well a team is able to isolate itself from global changes (by choosing stable/boring APIs, inventing their own abstractions in their own little corner, or what have you).

> Having a good incremental build/test system is far more important to developer experience, IMHO.

This is very true. Used poorly, monorepos are a crutch which allow a team to pretend that stable interfaces, versioning, and boundaries don't matter. Sure, your team can (theoretically) build the universe from a single git clone. Now what happens when another team needs to deal with your mess? What happens when you add some external dependency and now you have to deal with all of those problems anyway?

[You also shouldn't use git submodules to solve this, because that's basically the same thing but with the added annoyance of git. You should publish your bloody packages. With version numbers. And changelogs. Real version numbers. Real changelogs. Written by humans].

The author mentions the complexity barrier in open source, and I think that's a really interesting observation, but at the same time I think that complexity is the reason free software is alive today. It is definitely overwhelming for newcomers when a project requires a whole bunch of specific pieces that are all from different places. But once you've gotten past that, collaboration between a diverse range of people and organizations becomes an obvious and practical thing instead of a major undertaking. People don't all go off and write their own things from scratch[1], or clone code from place to place because it's too annoying to reuse it properly. Something internal feels similar to something external, which reinforces collective ownership.

Consider that Chromium includes its own everything, takes half a day to build from source, and is decidedly not a community-run project. Debian, meanwhile, is the polar opposite of a monorepo and continues to be alive and well without the oppressive shadow of a single 600 ton schizophrenic gorilla.

I think a lot of the time a team just wants a monorepo because they want a one-stop shop for fetching and building all of the things because internal dependencies are difficult. If that is the case, I think it's always worth considering something like BuildStream. It lets you specify where things are and how to build them, and it provides some useful tools on top of that. It doesn't solve brute-forcing a change across multitudes of applications, but it lowers the barrier to entry, it forces developers to care about deployment once in a while, and it can certainly help you to spot the integration issues when you change an interface without telling anyone.

[1] People will laugh at me for saying that from an operating known for having more window managers than there are text editors, but really, have you seen some proprietary software projects?

Honestly I would 100% do a monorepo every single time if there was good tooling for incrementally building and testing libraries. Having to rebuild every image from scratch for every single change scales miserably. Things like Bazel exist, but you basically have to have a team dedicated to operating it (maybe the difficulty varies by language, but it was a major pain when I tried to use it to build some relatively simple Python projects a few years ago).
This isn't really true anymore, in my experience. I've used Bazel with teams of 30-50 and no full-time maintainer, let alone a team.

Once the migration is done, all you need is a few people that do some Bazel gardening every few weeks, and it's certainly not a full time job. This can be someone that does operations (CI, deployments, etc) or a product/infrastructure engineer, or one of each. Github / Gitlab scale to all but the largest projects, and even then, you can just split into two or three "monorepos" and kick the can down the road. With things like BuildBuddy, it's even easier.

As the article states, there are a lot of little of hidden costs and paper cuts when using a many-repo layout. The one that I've seen that's most prevalent is that it obscures copy/paste behavior, since it's much more difficult to detect in a many-repo setup.

Going to Bazel or equivalent is a bit of a mind adjustment, and some languages are better supported than others, but it really starts to pay off in larger projects. Especially if there's more than a few languages in use.

I have personally run converted build systems to Bazel, and use it for personal projects as well.

Bazel 1.0 was released in October 2019. If you were using it "a few years ago", I'm guessing you were using a pre-1.0 version. There's not some cutoff where Bazel magically got easy to use, and I still wouldn't describe it as "easy", but the problem it solves is hard to solve well, and the community support for Bazel has gotten a lot better over the past years.

https://github.com/bazelbuild/rules_python

The difficulty and complexity of using Bazel is highly variable. I've seen some projects where using Bazel is just super simple and easy, and some projects where using Bazel required a massive effort (custom toolchains and the like).

My understanding is that the Python ecosystem is notoriously difficult to integrate w/ Bazel. Javascript is another ecosystem with a lot of fast and loose stuff going on during installs. Golang integration is way better. At work, we use wrappers over bazel (e.g. gazelle) mostly to handle things like auto-generation of BUILD files by parsing source code import declarations and the like. This takes most of the friction away, to the point that many folks don't actually need to understand Bazel to any significant degree.
I think Go works so well with Bazel, because Bazel's concept of versions and modular dependency trees is very similar to Go's.

Python and JS have come a long way in this regard, but depending on the libraries you're using, these languages are still way behind Go/Bazel's standards.

If you're using python, and are scared to jump straight to bazel, poetry is a good in-between tool. More forgiving than Bazel, but strict enough to make a future Bazel migration much less painful.

I've mostly pivoted out of Python and into Go. Would be interested in a writeup of Go/Bazel if you have any recommendations.
Is there, say, IntelliJ support for Bazel? Do you need a central server?

I've heard bazel is a bear...

But... all mature build systems are, because they become essentially enterprise workflow engines, process execution engines, internal systems integration hubs, and schedulers. Why? Because that's what an enterprise/mature build system is, it only differs from other software packages with the same capabilities in that it concentrates on build / deploy / CI "business tasks".

My current employer uses Jenkins (which has workflows/pipelines, daemons) and then feeds into Spinnaker (which has a full DAG workflow engine and interface) and likely this is pretty close to a "standard" or "best of breed" cloud build CI system. Of course there is a dedicated team.

Oh and of course the gradle code build in github has its own pipelines and general Turing machine to do whatever you want.

> Is there, say, IntelliJ support for Bazel?

Seems like JetBrains has recently committed to some amount of 1st-party Bazel support: https://twitter.com/ebenwert/status/1506683612518887425

And there's now a component in their tracker: https://hub.jetbrains.com/projects/Bazel

Yeah, in my experience Bazel is painful, but I've prototyped my own build system before. There's no fundamental reason it has to be a pain, I think Bazel just made really strange decisions and didn't pay much thought to helping people find the happy path. When I looked into it at least, the documentation seemed to assume you've used Bazel or something like it before.

Jenkins isn't a solution because it doesn't understand the dependency graph and can't help you with things like incremental rebuilds. It's just a task runner component, which a distributed build tool would probably offer out of the box.

I would be interested in understanding what you think Bazel's strange decisions are. Prior to Bazel, I had also used some of my own custom build systems--I was multiple rewrites into it--and I had independently come to some of the same conclusions as Bazel, such as the need for target syntax that separates the package name from the target name within the package (like how Bazel specifies targets as //dir/abc:def).

There are a number of other Bazel decisions that seemed strange until I tried to figure out how I would implement some particular feature.

My conclusion is that there are lots of small reasons why build systems are a pain in practice, and that the problem is a lot more complex than most people give it credit for.

> Is there, say, IntelliJ support for Bazel?

there is, but it heavily depends on language. It also has issues/makes choices with transitive resolves for things like java/kotlin, so you might have something that builds but the depencency is not resolved in intellij for autocomplete.

> incrementally building and testing libraries.

Like Make?

Make is not very well suited to this problem for a lot of reasons. Perhaps the biggest and least controversial is that it's not hermetic. Makefiles often make assumptions about the build environment, so a build that succeeds for one contributor will fail for another. I'm sure others have done a much better analysis of make vs bazel than I could do here.
Did you hear about nx.dev?
feel like the root problem companies run into when you don't have monorepo is shit gets locked down - e.g. I didn't even know this repo existed b/c I couldn't see it/clone it b/c of permissions. the other thing is lets say we have microservices - now I need to call your service - and most places are terrible at documenting things - especially if it's a new service which it probably is I'm trying to connect w/ it for the first time - now I have to figure out how to call your service - I can do that on my own my cloning your project and reading the code and bugging you but I'm way more less likely to bug you if it's part of the mono and I just need to open the code in the existing probject. I think this leads to a second point is mono does lead to more consistency and better knowledge sharing across codebases.
It's more than that. When you have to make changes that touch a lot of dependencies, it's much easier if all those dependencies

  - are in the same repo (making it easy to
    find and change all of them)

  - are in the same universe of build/test/deploy
    services (making integration of your changes
    atomic)
Atomicity of integration is essential, especially in organizations that move fast and make lots of breaking interface changes. Where it's to make a breaking interface change, it will be OK IFF you can make that change atomic.

Conversely, if you want to be able to make breaking interface changes, the integration and deployment of those has to be atomic.

Not having a monorepo & monobuild means that you have to have stringent interface backwards-compatibility commitments. That's fine if you're shipping an operating system, say, but it's usually too painful if you're not shipping anything to third parties.

For me, the atomicity feature is the killer feature of monorepos.

But you can never have true atomicity like that unless you pull in all of the source for all of your dependencies. That means, for most people, the Linux kernel and the standard gnu libraries and utilities. That's a lot of source code. And then you have to maintain all of those. If you're Google, you can do that. If you're a startup, probably not so much.
Correct, thus... monorepos.

Now, for Linux, the kernel<->user-land ABI is deemed stable, so you don't have to coordinate updates with the C (and other) run-times.

Other OSes did have the kernel and the C library in the same repository, so those have had the privilege of making their kernel<->user-land ABIs private. E.g., Solaris/Illumos, OS X.

Now, obviously if you have a monorepo for your startup, you might not include the Linux kernel in it mainly because you probably don't want your devs changing the kernel unless that's integral to the startup's purpose.

Deploying atomic changes is much harder than writing them. having a host be updated atomically doesn't mean everything it communicates with has gotten the same change
If there's no external linkage, then it's easy. If there is, then it's not. But usually the surface area of external linkage is much less than that of internal linkage. So, yes, there's value in this.
This is massively oversimplified.

- the cost of having access to more than you need: cognitive load and tooling for filtering, larger repositories require more tooling work to be performant

- there's also the atomicity of change and past changes which one can see/understand

How is that different from what I said?
Years ago, Google had a gcheckout tool that would trace the dependency information for whatever project you were working on, and then selectively grab the portions of the monorepo that you were going to need for it. Maybe they still have that or it's evolved into something else; I dunno, I haven't been there in a really long time.

Anyway, it seemed like such an obvious complement to the the perforce/monorepo style of working that I came away surprised that perforce wouldn't have hoisted such a thing into their product as a first-class feature. Tracing dependencies across a lot of different build systems is obviously not trivial, but it's not intractable, particularly if the tool is pluggable so that orgs can provide modules to handle their own particular approach.

What you are describing is an artifact of the old Perforce which copied everything in your client to local storage. After the conversion to srcfs and piper, which was more than a decade ago, this became unnecessary.

http://google-engtools.blogspot.com/2011/06/build-in-cloud-a...