Hacker News new | ask | show | jobs
by Cedricgc 2439 days ago
The main driver of success in either model is in the tooling and practices invested in it to make it work in an organization. Google is successful with their monorepo because they have invested in building (blaze), source control (piper, code search), and commit to always developing on HEAD. Multirepo is currently easier for most companies because most public tooling (git, package manager) is built around multirepos. One place I see multirepos fall over is awful dependency management practices internally and in open source. Many dependencies quickly become outdated and are not updated in cadence, slowing down writers and consumers. Better tooling can help here but an organization needs real discipline to stay on top of things.
4 comments

When I started at Google the tooling was not very good, and monorepo was pretty painful. They used perforce, and it simply couldn't keep up. Commits could take minutes. Code review was also unbearably slow. Blaze didn't exist yet; just before I started they had tools that generated million+ line makefiles that everyone hated. So yeah, you need good tooling, but even Google didn't have it for the first ~10 years of its existence.
I went to graduate school with a guy who ended up on Google's infrastructure team. He'd previously worked as head or lead dev for subversion (a little hazy on the details).

We worked on a small project where we put together statistical measures for codebases. It was a lot of fun, even if the infrastructure was out of my wheelhouse at the time.

Folks that can manage billion-line codebases are on a whole different level I think. I wonder sometimes how many folks like that there are.

EDIT: Looks like he left for a bit and is now back. Good on him!

I wonder why nobody have made a good public monorepo offering similar to what Google have internally. Would probably be a hit at many companies since it fixes so many issues related to working in very large teams.
At large enough scale, it causes a lot of problems and breaks like every other dev tool requiring a lot of work to get it back together.

That said, there are some open source pieces to help. Facebook open sourced their mercurial stuffs so you can get version control at scale (and before then you just use perforce). Google open sourced bazel. Google open sourced some parts of the underlying infra behind code search, but not enough to really work properly. And of course lower level there's a plethora of reasonable db offerings, etc.

It would still require a lot of glue though.

It is just that Googles tooling around code works really well together. Code search to view code and directory based history so you aren't swamped by others commits, tap to run all unit tests all the time but with sectioned projects so you don't run all tests on every presubmit, sponge to gather every test log ever (even for the tests you run locally) so you can link full test logs to coworkers when you have problems (also note that the logs source file links are actual links into code search), critique for easy versioned code reviews where you can diff code between any sets of comments so you can see its evolution and running presubmit checks and sponge links for tests, blaze to make a structured directory dependency management system to make partial checkouts and distributed cached builds work well.

I'd like a set of tightly coupled tool like that working outside of Google, but I guess it might be a just a dream, it is a bit too big of a project.

> directory based commits This whole thread is interesting because subversion is exactly this, and works with large code bases. We used to have these told and we moved away from them.
Subversion works with large code bases, but not crazy massive codebases.

The version control is just the tip of the iceberg though, and is largely a solved problem: git or subversion, then perforce or straight to Facebook's mercurial stuff.

It's the other tooling that breaks on large enough mono repos that you have hard time with publicly. Searching your code takes awhile. Cross references don't go wide enough or take too long to generate. Builds take too long. Refactoring tools either take too long or don't support the repo of sufficiently large size.

Mostly this isn't a problem though, because few repos are actually large enough to cause real problems.

Now that Git Virtual File System (VFS) is coming to GitHub, I think we shall see an uptick in monorepo adoption. Though what I said early about compatible tooling still applies. The repo style affects so much of the tooling, from Continous Integration, deployment, building, versioning, etc.

It really is not a small difference.

Git VFS, to allow monorepos to be operable at enterprise scale (courtesy of Microsoft) is out there, as is Uber's white paper on ML to predict branch merge compilation success - an actual problem at enterprise scale. That doesn't cover all issues but every (huge) company's workflow is different, thus there's no "one size fits all" that's totally suitable - part of git's success is that it fits into whatever existing workflow a company already has. So I'm not sure the full suite of Google's tools would be at hit at all companies, especially anything smaller, and less dedicated, than. Google. Having a team to work on and another to operate, Bazle (and the internal Google project it springs from); not all enterprises are willing to staff that work, and definitely not as strongly as Google.
You know Google's monorepo is successful because they have so many of them! google3, Android, Chrome browser, ChromeOS...

Kidding aside, my point is Google recognizes obvious boundaries between e.g. their web stuff and android, and organizes their code accordingly.

Google uses no versioned libraries?
Libraries internal to Google are kept at the latest and consumers are updated to use the latest APIs. 3rd party libraries are checking into the monorepo at a specific version and everyone uses the same version.
I think they do, but there is only ever one single version in use - the version in the repo.
Yes, and that prevents duplication of work when it's time to upgrade: https://github.com/microsoft/TypeScript/issues/33272
If so, they would have a massive problem upgrading libraries like numpy because there are too many and too big breaking changes between releases.
Inside Google's main repo there are different build targets for libraries with incompatible API changes that are too difficult to fix all at once, e.g. there might be numpy_1_8 and numpy_1_10 separately.

Python at Google muddled along for years without numpy at all so it's not like anyone would be seriously harmed by having an old release in the repo.