Hacker News new | ask | show | jobs
by jeffbee 1538 days ago
The main downside that people always mention is it takes a long time to clone or pull a large repo. This is actually a flaw of git, not a flaw of the monorepo as a concept.
3 comments

People can't use the concept, they must use an actual tool.

The problem with these frequent monorepo discussion threads is that monorepos are at a significant disadvantage when it comes to good existing and available tools (especially open source ones), but most of the boosters work at companies that mostly use good existing and unavailable tools.

I've no problem with the discussion of course, and largely agree with the conceptual superiority in many cases, but on the practical side, the downsides are still significant and IMO overpowering. I've worked at insanely profitable medium sized companies that would use a monorepo if the tools were there, but instead used svn+externals and then git+a very simple script implementing essentially the same thing as svn:externals. The latter is a great option, IMO especially if you flatten all dependencies to the project/top level (i.e., all transitive dependencies specified and versioned at the top level), as you don't have the A->B->C problem where A using an updated C requires work from team C, B, A; you can just do C, A. It also discourages deeply nested dependencies, and bounds dependency count somewhat, and provides a very explicit and conscientious view of your total dependencies. Updates are also easy to partially automate.

At least in the case of git a surprising amount of the monorepo tooling is making it upstream into git itself. I'm aware of engineering efforts from both Microsoft and Twitter that are in today's git (a lot of the work on things like the git commit-graph and git sparse checkouts in particular are designed for monorepo tooling, though in some cases benefit smaller repos too).

Microsoft's monorepo tooling has been especially interesting to watch from an engineering standpoint as seemingly almost all of it has been in the public eye, open source, and in most cases upstreamed. VFS for Git [1] was one of their first approaches (simply virtualizing the git filesystem and proxying it through servers as necessary), and while portions of it will never be upstreamed (in particular because it needs OS drivers) it's all open source, a lot of concepts from it were upstreamed into git itself and VFS for Git is mostly considered legacy/deprecated. Microsoft's more recent follow up tool was Scalar [2], which started as a fork of most of the remaining relevant bits of VFS for git plus a repo config tool that helped setup sparse clones while the git CLI ("porcelain") for sparse cloning took a bit to catch up with what the "plumbing" could do. Most of that got directly upstreamed into the git "porcelain" and since that point so much of Scalar was upstreamed into git that the remaining tools of Scalar are now VCed directly in Microsoft's git fork rather than its own repo.

In terms of raw engineering capability it seems we are in something of a golden age of monorepo tools available as open source, for those trying to use git for monorepos. Admittedly the tools may be available now, but that doesn't make them any easier to work with than the era when they were simply unavailable because there's often a lot of engineering work still to be done to keep the tools humming along (in bandwidth and hosting alone).

It's just interesting to see more of the tools available transparently, sometimes because they still have benefits to even smaller scaled repos. (While VFS for Git is unlikely necessary for small/medium repos, there are some times where sparse clones can be handy at even medium sizes. A lot of the engineering work upstreamed to make sparse clones performant and capable indirectly benefit repositories of any scale in reducing filesystem reads overall and adding support for storing better computed caches on-disk such as commit-graphs and reachability bitmaps rather than repetitively rebuilding them in memory.)

[1] https://github.com/microsoft/vfsforgit

[2] https://github.com/microsoft/scalar

I've been following monorepo support in git for many years (I'm usually the person who knows the most about tools like SCM and build systems on my team, and I spend most of my time saying "no, the hot new thing doesn't work well in practice yet", or "unfortunately it still has some significant downsides for us as a team").

Yes, the monorepo story today is better than ever, and in the last 6 months or so, a few developments have made it maybe even workable for teams like mine. I haven't seriously kicked the tires on the latest developments yet, so it's possible we're in a golden age as you say, but I'm skeptical. Just being realistic, but "better than ever" doesn't mean "good enough to use without needing significant time investment and deployment complexity cost for medium sized organizations". As an example, things like VFSforgit carried unreasonable deployment complexity cost compared to the benefit.

Ironically, the monorepo discussion reminds me of the discussions around complex architectures (SoA, fault tolerance, microservices, etc.) vs. simple architectures (file/SQL/dumb KV store and monoliths, etc.). For complex architectures, the benefits vs system size is a hockey stick, starting flat and negative, then rising somewhat linearly until it finally becomes beneficial at a certain (arguably very large) size. For monorepos, given the actually existing and available tooling, it's more of a U shape -- for tiny systems, it's great, becomes problematic for medium sized orgs who usually split repos after reaching painful size, then plateaus and slowly ticks up again when you become Twitter or whatnot. I think most of us live in the early phase or the middle phase, and almost no organizations live in the final phase (lots of developers who post on HN do though...).

I don't knock anyone who manages to do it at medium or large size with the current tools, especially if they manage by having discipline rather than having to spend a lot of time doing meta-work rather than their actual work, but IME I've found even that to not be worth it.

Absolutely. Having access to the tools still doesn't immediately make them cost effective at most organization sizes beyond the organizations that built the tools.
I think you can clone just the last commit:

> Provide an argument of -- depth 1 to the git clone command to copy only the latest revision of a repo:

  git clone -–depth [depth] [remote-url]
Git has some strange behaviors with shallow clones (trying to manage volume using --depth). Shallow clones are great for CI builds, but not so great with working copies used actively by developers.

At this point in git the better tool is called "sparse clones" (using the --sparse keyword and some other associated tools). A lot of interesting engineering work has been put into making "sparse clones" very performant on "conical sections" ("sparse cones") of a repository at a time. (In the way of checking out a single sub-directory of a monorepo and just its history type of thing.)

OK but it still takes a long time with millions of source files.
Is that a flaw with git? Or a flaw with trying to use git for monorepos, vs some other change management built for that kind of repo?