Hacker News new | ask | show | jobs
by howinator 3316 days ago
So, this is actually pretty common. I know that both Google and Facebook use a huge mono-repo for literally everything (except I think Facebook split out their Android code into a separate repo?). So, all of Facebook's and Google's code for front-end, back-end, tools, infrastructure, literally everything, lives in one repo.

It's news to me that Windows decided to go that route too. Personally, I think submodules and git sub-trees suck, so I'm all for putting things in a monorepo.

2 comments

How does a mono-repo company manage open sourcing a single part of their infrastructure if things are in one large repo? For example, if everything lived in one repo, how does Facebook manage open sourcing React? Or if I personally wanted to switch to one private mono-repo, how would I share individual projects easily?
So open sourcing can mean three different things:

The bad one is just dumping snapshots of the code into a public repo every so often. You need to make sure your dependencies are open source, have tools that rewrite your code and build files accordingly, and put them in a staging directory for publishing.

The good one is developing that part publicly, and importing it periodically into your internal monorepo with the same (or similar) process to the one you use for importing any other third-party library.

There's also a hybrid approach which is to try and let internal developers use the internal tooling against the internal version of the code, and also external developers, with external tooling, against the public version. That one's harder, and you need a tool that does bidirectional syncing of change requests, commits, and issues.

We have an internal tool that allows us to mirror subdirectories of our monorepo into individual github repositories, and another tool that helps us sync our internal source code review tool with PRs etc.
An internal tool which manages commits, between individual repos etc. does it not seem that this is a logical extension to git itself? A little like submodules, but being able to publish only parts of the sourcetree. Maybe it would be impossible to keep any consistency and leaking information from the rest of the tree.
With difficulty.

No, seriously, that's the answer.

They have an internal mono-repo and public repos on GitHub that are mirrors of their mono-repo.
pros to big repo:

-dont have to spend time to think about defining interfaces

cons:

-history is full of crap you dont care about

-tests take forever to run

-tooling breaks down completely, though thanks to MS the limit was increased seriously

Are the big monorepo companies actually waiting for global test suite completion for every change? I'd doubt that, I'm sure they're using intelligent tools to figure out what tests to actually run. Compute for testing is massively expensive at that scale so it's an obvious place to optimize
Google's build and testing system is smart in which tests to run, as you suspect, but it still has a very, very large footprint.
Right. My point is that the monorepo almost certainly isn't a problem in this regard.
You still have to do something about internal interfaces. The problem is that the moment you want to make a backwards-incompatible change to an internal interface now you have to go find users of it, and there go the benefits of GVFS... Or you can let the build and test system tell you what breaks (take a long coffee break, repeat as many times as it takes; could be many times). Or use something like OpenGrok to find all those uses in last night's index of the source.

Defining what portions of the OS you'll have to look in for such changes helps a great deal.

As to building and testing... the system has to get much better about detecting which tests will need to be re-run for any particular change. That's difficult, but you can get 95% of the way there easily enough.

-dont have to spend time to think about defining interfaces

That seems like a design and policy choice, orthogonal to repos.

Not really. It's easier to make a single atomic breaking change to how different components talk to each other if they are in the same repository.

If they are in different repos, the change is not atomic and you need to version interfaces or keep backwards compatibility in some other way.

It's very much really. The fact that it's easier doesn't really matter - a repo is about access to the source code and its history with some degree of convenience. The process and policy of how you control actual change is quite orthogonal. You can have a single repo and enforce inter-module interfaces very strongly. You can have 20 repos and not enforce them at all. Same goes for builds, tests, history, etc. The underlying technology can influence the process but it doesn't make it.
I have always wondered how they deal with acquisitions and sales. I guess a single system makes sense there too.