Hacker News new | ask | show | jobs
by woolvalley 2728 days ago
My org went from polyrepo 10 commit semver dependency hell when updating an internal API to monorepo and it saves a lot of time. Unmigrated semver breaking changes are a form of technical debt, and it takes a lot more total man hours to do the 'proper' one by one many commit poly repo migration than the other way around.

If we had the tooling to do multirepo atomic commits and reviews then maybe we would of stuck with polyrepos, but it doesn't really exist out in the wild, so monorepo it was.

2 comments

> Unmigrated semver breaking changes are a form of technical debt

Maybe you can clear my confusion. If Module B is dependent on Module A, then every version of B should refer to a specific version of A, correct? What is there to break? Development can continue on A without interfering with B, and then you can uptick B once it points to a later A.

I'm not sure what this has to do with the mono/poly discussion.

Engineering resources are not unlimited, so naturally new bugs and features will be updated only on master vs. 2 or 5 major semver branches, because 2-5 module Bs haven't bothered updating yet. If you maintain 5 separate branches, then you're spending that much more engineering resources for little benefit, because you don't have external customers. So the modules that haven't migrated yet decay under a state of deferred maintenance, which is a form of technical debt.

To avoid that, you do 10 migration commits so everyone is on the latest version. If you're going to do that as standard operating procedure anyway might as well make it far easier and have a monorepo.

My org went from a monorepo where every project had to obey the same CI model and you could not introduce entirely new CI tools for new prototypes over to a polyrepo with separated semver library repos for shared dependencies, and it simplified everything so much.

Adding additional PRs across different repos is functionally no different than the same PR with scattered dependencies in a monorepo, except that separating the PRs makes each isolated set of changes more atomic and focused, which has led to fewer bugs and better quality code review and, the hugest win, each repo is free to use whatever CI & deployment tooling it needs, with absolutely no constraints based on whatever CI or deployment tool another chunk of code in some other repo uses.

The last point is not trivial. Lots of people glibly assume you can create monorepo solutions where arbitrary new projects inside the monorepo can be free to use whatever resource provisioning strategy or language or tooling or whatever, but in reality this not true, both because there is implicit bias to rely on the existing tooling (even if it’s not right for the job) and monorepos beget monopolicies where experimentation that violates some monorepo decision can be wholly prevented due to political blockers in the name of the monorepo.

One example that has frustrated me personally is when working on machine learning projects that require complex runtime environments with custom compiled dependencies, GPU settings, etc.

The clear choice for us was to use Docker containers to deliver the built artifacts to the necessary runtime machines, but the whole project was killed when someone from our central IT monorepo tooling team said no. His reasoning was that all the existing model training jobs in our monorepo worked as luigi tasks executed in hadoop.

We tried explaining that our model training was not amenable to a map reduce style calculation, and our plan was for a luigi task to invoke the entrypoint command of the container to initiate a single, non-distributed training process (I have specific expertise in this type of model training, so I know from experience this is an effective solution and that map reduce would not be appropriate).

But it didn’t matter. The monorepo was set up to assume model training compute jobs had to work one way and only one way, and so it set us back months from training a simple model directly relevant to urgent customer product requests.

Had we been able to set this up as a separate repo where there were no global rules over how all compute jobs must be organized, and used our own choice of deployment (containers) with no concern over whatever other projects were using / doing, we could have solved it in a matter of a few days.

In my experience, this type of policy blocker is uniquely common to monorepos, and easily avoided in polyrepo situations. It’s just a whole class of problem that rarely applies in a polyrepo setting, but almost always causes huge issues with monorepo policies and fixed tooling choices that end up being a poor fit for necessary experiments or innovative projects that happen later.

> each repo is free to use whatever CI & deployment tooling it needs, with absolutely no constraints based on whatever CI or deployment tool another chunk of code in some other repo uses.

Hear, hear. Let teams choose the processes and tools that work best for them. In previous release engineering positions, I resisted the many attempts to instroduce a single standard workflow for all projects. The support burden of letting a thousand flowers bloom was not great, but the benefit was that devs understood their project and were empoiwered to make changes when the business requirements changed faster than standardized tooling could.

We had a few contracts for standard behaviours, but they were low-overhead: must respond to 'make/make test', have a /status endpoint that 500'd when it was unhealthy, register a port in the service conf repo, etc.

> except that separating the PRs makes each isolated set of changes more atomic and focused

It makes it less atomic if you need simultaneous changes in multiple repositories.

> Had we been able to set this up as a separate repo where there were no global rules over how all compute jobs must be organized, and used our own choice of deployment (containers) with no concern over whatever other projects were using / doing, we could have solved it in a matter of a few days.

I think this was an organisational problem, but I accept the argument that monorepos will provide a seed around which such pathologies can crystallise. But I don't believe it's the only such seed and I don't think it's an inevitable outcome from monorepos.

> It makes it less atomic if you need simultaneous changes in multiple repositories.

No, each individual set of changes is more atomic (smaller in scope, mutating a system from one state of functionality to a new state of functionality).

The problem is that it’s a linguistic fallacy to act like in the monorepo case “the system” is the sum of a bunch of separate systems (it isn’t, because they are not logically required to depend on simultaneously transitioning). So in that monorepo case, to move subcomponent A from some state of functionality to a new state of functionality, you unfortunately have to also make sure you include totally unrelated (from subcomponent A’s point of view) changes that also correctly transition subcomponent B to a new state of functionality, and subcomponent C, etc., which is exactly less atomic (to transition states, you are required to have simultaneous other transitions that are not logically required for any reason other than the superficial sake of the monorepo).

> simultaneous other transitions that are not logically required for any reason other than the superficial sake of the monorepo

I don't see what's superficial about "everything everywhere is in sync", myself.

And I have absolutely seen PR race conditions. Assuming that everyone perfectly sliced up the polyrepo on the first go is optimistic.

> “I don't see what's superficial about "everything everywhere is in sync", myself.”

Well it is superficial by definition, because two unrelated things are “in sync” only because you say so. The very meaning of “in sync” in your sentence is some particular superficial standard you chose that has nothing to do with the logical requirements of the isolated subcomponents (i.e. “in sync” meaning two independent subcomponents were adjusted in the same large commit or PR is, by definition, superficial... it’s just a cosmetic notion of “in sync” you chose for reasons unrelated to any type of requirement).

I work on a polyrepo. The code in repo A has a dependency on the code in repo B. When I update B, I sometimes need to update A.

In a monorepo that's already done when I finish working on the modules in B.

That I am unable to release from A until it has been synced with the module in B is not "a cosmetic notion". It's being unable to release. I consider releasability at all times to be the most important invariant to be sought by the combination of tests, CI and version control.

This has nothing to do with monorepos though. Its entirely a company policy issue. There's nothing about the monorepo that prevents you from writing a script that ran on precommit and built and deployed via docker to a test cluster.

Unless you mean your presubmit test would push to production machines, that's bad and shouldn't be allowed, but again has nothing to do with a monorepo.

A company could just as easily have draconian policy about testing and deployment and multiple repos. Maybe you could break the rules (hell you could have broken the rules in monorepo land), but again, that's just a rules issue, not an issue of the repository.

If a tool begets using it wrong all the time, then after a certain point, it’s the tool’s fault.

What you’re saying amounts to something of a No True Scotsman fallacy... “no _real_ monorepo would limit different projects from using individualized tooling if needed...” Yet that limitation suspiciously coexists with monorepo tooling frequently, and does not frequently coexist with polyrepo tooling.

>If a tool begets using it wrong all the time

This is the (wrong) assumption. Like I said, there's nothing about a monorepo that "begets" draconian policy. Your anecdotal experience is not a rule. The monorepo I work with doesn't have draconian policies about how tooling must work. There are apis and recommended tools, and if those don't fit your needs (which is unusual), the teams that maintain those tools are willing to support your uses, but if not, you're also free to hack yourself something that works. Writing additional pre-commit hooks is encouraged.

> What you’re saying amounts to something of a No True Scotsman fallacy

Again, no. Certainly monorepos can do this. They're still real monorepos. But polyrepos can too. They're still polyrepos. Its orthogonal.

There absolutely is something about monorepos that begets monopolicies: exactly the very thing that makes them co-occur. It doesn’t matter if it’s sociological or technological, the co-occurrence itself is the thing.

I’d flip it around and say instead that you are assuming the properties by which to compare the two approaches ought to be properties that are roughly like “first principles” and that no first principles difference really exists between them in terms of limiting what you can do.

But this is the wrong way to look at it because, pragmatically, it’s simply just not the sociological phenomenon that actually happens as a side effect in terms of the practical result. Who cares if there’s a first principles reason for them to be different in terms of effectiveness? I certainly don’t— they just are different in terms of effectiveness.

>There absolutely is something about monorepos that begets monopolicies

Correlation (and a weak one at that) is not causation.

I can just as easily suggest that monopolicies beget monorepos, and that indeed makes a lot more sense. Its easier to enforce global standards when there's a single repo. So companies who wish to enforce draconian standards may move in that direction. That says nothing about companies that don't wish to enforce draconian standards though.