The official reason is that the "internals of Git" weren't conduce to the kinds of invasive changes they needed/wanted. But I think the truth is closer to being that it was going to be too hard/slow to get those invasive changes past the Git mailing list.
> The official reason is that the "internals of Git" weren't conduce to the kinds of invasive changes they needed/wanted. But I think the truth is closer to being that it was going to be too hard/slow to get those invasive changes past the Git mailing list.
Which is about the same thing, mercurial was built to be at least somewhat pluggable, so facebook could build their extensions independently, and work to get a subset of them integrated into mainline. Git is designed so it can be built on top of, but not really under or within.
Of the five largest tech companies only 1 has created a mono repo in git after Google and Facebook chose mercurial and started proving it out. I think it’s less about what the project was designed for and the communities ability to appreciate the challenges facing these large companies. You can tell from the post because all the feedback is “have you tried doing something totally different” rather than “that’s an interesting scaling challenge - how can we make git perform better at scale? what are the properties we’d have to trade off? how do we manage that tension or is this really fundamentally incompatible and if so clearly communicating why it’s incompatible with the project’s goals”.
At a technical level I believe both Google and Facebook engineers make core changes to mercurial when they need to so I don’t think that’s the root philosophical difference.
The mailing list piece is from 2012, and describes how git is very slow on a synthetic repo with millions of files and commits. Today, my current place of work has a monorepo that’s approaching the size described in this mailing list, but git seems to be holding up just fine. If you checkout a branch that’s far enough away from master it takes a minute, but add, rebase, commit, status and blame are all negligibly impacted speed-wise. The only issue we run into is rejected non-conflicting pushes to master during peak hours, with maybe several dozens of engineers trying to merge and push master simultaneously.
Does anybody have any insight into what’s changed in git internally since 2012 to support bigger repos?
I don’t think there is one single change that made a huge difference. I follow the changelogs posted on the mailing list, and of the performance related changes, it’s often “we got 3-5% speed up on this benchmark on this fs without making things worse on others”.
Over 8 years and tens of those changes, it adds up to a significant performance improvement.
Git works nicely on Linux for Chromium with over 540K files spread across a few modules. On Mac and Windows it is kind of tolerable, but with git status taking 5 or more seconds, I started to use “git status directory” to get more instant feedback. And git blame can take more than a minute, so it is often better to look at the log and guess the changes from it.
Free RAM has a huge influence of how much of the filesystem tree is cached in kernel. This is visible from just `time find`. It could just be a case of developer workstations going from e.g. 4 GB to 16 GB.
The official reason is that the "internals of Git" weren't conduce to the kinds of invasive changes they needed/wanted. But I think the truth is closer to being that it was going to be too hard/slow to get those invasive changes past the Git mailing list.
Funny, but I'm starting to wonder if there's an affect-based complement to Conway's Law.
Which is about the same thing, mercurial was built to be at least somewhat pluggable, so facebook could build their extensions independently, and work to get a subset of them integrated into mainline. Git is designed so it can be built on top of, but not really under or within.