Hacker News new | ask | show | jobs
by WorldMaker 1538 days ago
At least in the case of git a surprising amount of the monorepo tooling is making it upstream into git itself. I'm aware of engineering efforts from both Microsoft and Twitter that are in today's git (a lot of the work on things like the git commit-graph and git sparse checkouts in particular are designed for monorepo tooling, though in some cases benefit smaller repos too).

Microsoft's monorepo tooling has been especially interesting to watch from an engineering standpoint as seemingly almost all of it has been in the public eye, open source, and in most cases upstreamed. VFS for Git [1] was one of their first approaches (simply virtualizing the git filesystem and proxying it through servers as necessary), and while portions of it will never be upstreamed (in particular because it needs OS drivers) it's all open source, a lot of concepts from it were upstreamed into git itself and VFS for Git is mostly considered legacy/deprecated. Microsoft's more recent follow up tool was Scalar [2], which started as a fork of most of the remaining relevant bits of VFS for git plus a repo config tool that helped setup sparse clones while the git CLI ("porcelain") for sparse cloning took a bit to catch up with what the "plumbing" could do. Most of that got directly upstreamed into the git "porcelain" and since that point so much of Scalar was upstreamed into git that the remaining tools of Scalar are now VCed directly in Microsoft's git fork rather than its own repo.

In terms of raw engineering capability it seems we are in something of a golden age of monorepo tools available as open source, for those trying to use git for monorepos. Admittedly the tools may be available now, but that doesn't make them any easier to work with than the era when they were simply unavailable because there's often a lot of engineering work still to be done to keep the tools humming along (in bandwidth and hosting alone).

It's just interesting to see more of the tools available transparently, sometimes because they still have benefits to even smaller scaled repos. (While VFS for Git is unlikely necessary for small/medium repos, there are some times where sparse clones can be handy at even medium sizes. A lot of the engineering work upstreamed to make sparse clones performant and capable indirectly benefit repositories of any scale in reducing filesystem reads overall and adding support for storing better computed caches on-disk such as commit-graphs and reachability bitmaps rather than repetitively rebuilding them in memory.)

[1] https://github.com/microsoft/vfsforgit

[2] https://github.com/microsoft/scalar

1 comments

I've been following monorepo support in git for many years (I'm usually the person who knows the most about tools like SCM and build systems on my team, and I spend most of my time saying "no, the hot new thing doesn't work well in practice yet", or "unfortunately it still has some significant downsides for us as a team").

Yes, the monorepo story today is better than ever, and in the last 6 months or so, a few developments have made it maybe even workable for teams like mine. I haven't seriously kicked the tires on the latest developments yet, so it's possible we're in a golden age as you say, but I'm skeptical. Just being realistic, but "better than ever" doesn't mean "good enough to use without needing significant time investment and deployment complexity cost for medium sized organizations". As an example, things like VFSforgit carried unreasonable deployment complexity cost compared to the benefit.

Ironically, the monorepo discussion reminds me of the discussions around complex architectures (SoA, fault tolerance, microservices, etc.) vs. simple architectures (file/SQL/dumb KV store and monoliths, etc.). For complex architectures, the benefits vs system size is a hockey stick, starting flat and negative, then rising somewhat linearly until it finally becomes beneficial at a certain (arguably very large) size. For monorepos, given the actually existing and available tooling, it's more of a U shape -- for tiny systems, it's great, becomes problematic for medium sized orgs who usually split repos after reaching painful size, then plateaus and slowly ticks up again when you become Twitter or whatnot. I think most of us live in the early phase or the middle phase, and almost no organizations live in the final phase (lots of developers who post on HN do though...).

I don't knock anyone who manages to do it at medium or large size with the current tools, especially if they manage by having discipline rather than having to spend a lot of time doing meta-work rather than their actual work, but IME I've found even that to not be worth it.

Absolutely. Having access to the tools still doesn't immediately make them cost effective at most organization sizes beyond the organizations that built the tools.