Hacker News new | ask | show | jobs
by jkaptur 1538 days ago
> the downsides are already widely discussed.

Does anyone have any useful pointers? I'm in such total agreement with the article that I actually don't know the counterarguments.

4 comments

There are a bunch of downsides, although they are often just the opposite problem from what the monorepo solves.

For example, the article states:

> [In the other direction,] Forcing dependees to update is actually another benefit of a monorepo.

What happens when the other teams that depend on your work don’t have the time/priority to update their code to work with the new version of your code? The ideal case that monorepo proponents tout is that the team updating the code that is depended on can just update the code for everyone who depends on them… however, that update is not always trivial and might require rework for code deeper inside the other teams projects. Maybe they are depending on functionality that is going away and it requires major work to change to the new functionality, and the team is working on other high priority things and can’t spend the time right now to update their code.

What does the team do? Do they wait until every team who depends on them is ready to work on updating? Do they try to work out how the depending team is using their code so they can update it themselves? How does this work if there are dozens of teams that use the dependency? You cant have every team that creates core shared code be experts on every other team’s work. You can end up stuck waiting for every team to be ready to work on updating this dependency.

Imagine if this was how all dependencies in your code worked, and every build task used the latest release of every dependency regardless of major version bumps. You might wake up on a Tuesday and your build fails and now you have to spend a week updating your project to use the latest version. Multiply this by all the dependencies and your priority list is no longer your own, you are forced to spend your time fixing dependencies.

This is why we specify versions in our dependencies, so we can update on our own schedule.

Of course, the downside of this is now you have to support multiple versions of your code, which is the trade off and the problem a monorepo solves.

You are going to end up with downsides either way, the question is which is worse.

> What does the team do? Do they wait until every team who depends on them is ready to work on updating? Do they try to work out how the depending team is using their code so they can update it themselves?

versioned multi-repos may solve this for the team[s] demanding incompatible changes to shared code but any team who was happy to use the shared code as it currently is, and was expecting to also benefit from any upcoming compatible improvements will see only problems with this "solution".

Better to give the new incompatible behavior a new name. Deprecate the old name. Then callers of the old thing can fix on their own schedule.

> versioned multi-repos may solve this for the team[s] demanding incompatible changes to shared code but any team who was happy to use the shared code as it currently is, and was expecting to also benefit from any upcoming compatible improvements will see only problems with this "solution"

Normally this is solved with semantic versioning... you pin to a minor version, so you get all non-breaking changes, but don't pull in breaking changes.

In my experience semantic versioning is more aspirational than something you can actually rely on.

Many times bug fixes are not back-ported to prior lines of dev

Other times, claims of backward compatibility on minor version releases are wrong.

A monorepo assumes all your IP is either open or closed or you need a very reliable way to extract the OSS bits and publish them to a mirror without putting exposure of closed source IP at risk.
The main downside that people always mention is it takes a long time to clone or pull a large repo. This is actually a flaw of git, not a flaw of the monorepo as a concept.
People can't use the concept, they must use an actual tool.

The problem with these frequent monorepo discussion threads is that monorepos are at a significant disadvantage when it comes to good existing and available tools (especially open source ones), but most of the boosters work at companies that mostly use good existing and unavailable tools.

I've no problem with the discussion of course, and largely agree with the conceptual superiority in many cases, but on the practical side, the downsides are still significant and IMO overpowering. I've worked at insanely profitable medium sized companies that would use a monorepo if the tools were there, but instead used svn+externals and then git+a very simple script implementing essentially the same thing as svn:externals. The latter is a great option, IMO especially if you flatten all dependencies to the project/top level (i.e., all transitive dependencies specified and versioned at the top level), as you don't have the A->B->C problem where A using an updated C requires work from team C, B, A; you can just do C, A. It also discourages deeply nested dependencies, and bounds dependency count somewhat, and provides a very explicit and conscientious view of your total dependencies. Updates are also easy to partially automate.

At least in the case of git a surprising amount of the monorepo tooling is making it upstream into git itself. I'm aware of engineering efforts from both Microsoft and Twitter that are in today's git (a lot of the work on things like the git commit-graph and git sparse checkouts in particular are designed for monorepo tooling, though in some cases benefit smaller repos too).

Microsoft's monorepo tooling has been especially interesting to watch from an engineering standpoint as seemingly almost all of it has been in the public eye, open source, and in most cases upstreamed. VFS for Git [1] was one of their first approaches (simply virtualizing the git filesystem and proxying it through servers as necessary), and while portions of it will never be upstreamed (in particular because it needs OS drivers) it's all open source, a lot of concepts from it were upstreamed into git itself and VFS for Git is mostly considered legacy/deprecated. Microsoft's more recent follow up tool was Scalar [2], which started as a fork of most of the remaining relevant bits of VFS for git plus a repo config tool that helped setup sparse clones while the git CLI ("porcelain") for sparse cloning took a bit to catch up with what the "plumbing" could do. Most of that got directly upstreamed into the git "porcelain" and since that point so much of Scalar was upstreamed into git that the remaining tools of Scalar are now VCed directly in Microsoft's git fork rather than its own repo.

In terms of raw engineering capability it seems we are in something of a golden age of monorepo tools available as open source, for those trying to use git for monorepos. Admittedly the tools may be available now, but that doesn't make them any easier to work with than the era when they were simply unavailable because there's often a lot of engineering work still to be done to keep the tools humming along (in bandwidth and hosting alone).

It's just interesting to see more of the tools available transparently, sometimes because they still have benefits to even smaller scaled repos. (While VFS for Git is unlikely necessary for small/medium repos, there are some times where sparse clones can be handy at even medium sizes. A lot of the engineering work upstreamed to make sparse clones performant and capable indirectly benefit repositories of any scale in reducing filesystem reads overall and adding support for storing better computed caches on-disk such as commit-graphs and reachability bitmaps rather than repetitively rebuilding them in memory.)

[1] https://github.com/microsoft/vfsforgit

[2] https://github.com/microsoft/scalar

I've been following monorepo support in git for many years (I'm usually the person who knows the most about tools like SCM and build systems on my team, and I spend most of my time saying "no, the hot new thing doesn't work well in practice yet", or "unfortunately it still has some significant downsides for us as a team").

Yes, the monorepo story today is better than ever, and in the last 6 months or so, a few developments have made it maybe even workable for teams like mine. I haven't seriously kicked the tires on the latest developments yet, so it's possible we're in a golden age as you say, but I'm skeptical. Just being realistic, but "better than ever" doesn't mean "good enough to use without needing significant time investment and deployment complexity cost for medium sized organizations". As an example, things like VFSforgit carried unreasonable deployment complexity cost compared to the benefit.

Ironically, the monorepo discussion reminds me of the discussions around complex architectures (SoA, fault tolerance, microservices, etc.) vs. simple architectures (file/SQL/dumb KV store and monoliths, etc.). For complex architectures, the benefits vs system size is a hockey stick, starting flat and negative, then rising somewhat linearly until it finally becomes beneficial at a certain (arguably very large) size. For monorepos, given the actually existing and available tooling, it's more of a U shape -- for tiny systems, it's great, becomes problematic for medium sized orgs who usually split repos after reaching painful size, then plateaus and slowly ticks up again when you become Twitter or whatnot. I think most of us live in the early phase or the middle phase, and almost no organizations live in the final phase (lots of developers who post on HN do though...).

I don't knock anyone who manages to do it at medium or large size with the current tools, especially if they manage by having discipline rather than having to spend a lot of time doing meta-work rather than their actual work, but IME I've found even that to not be worth it.

Absolutely. Having access to the tools still doesn't immediately make them cost effective at most organization sizes beyond the organizations that built the tools.
I think you can clone just the last commit:

> Provide an argument of -- depth 1 to the git clone command to copy only the latest revision of a repo:

  git clone -–depth [depth] [remote-url]
Git has some strange behaviors with shallow clones (trying to manage volume using --depth). Shallow clones are great for CI builds, but not so great with working copies used actively by developers.

At this point in git the better tool is called "sparse clones" (using the --sparse keyword and some other associated tools). A lot of interesting engineering work has been put into making "sparse clones" very performant on "conical sections" ("sparse cones") of a repository at a time. (In the way of checking out a single sub-directory of a monorepo and just its history type of thing.)

OK but it still takes a long time with millions of source files.
Is that a flaw with git? Or a flaw with trying to use git for monorepos, vs some other change management built for that kind of repo?
Principle of least privilege springs to mind but I'm not familiar with the other issues.