Hacker News new | ask | show | jobs
by mlthoughts2018 2722 days ago
My org went from a monorepo where every project had to obey the same CI model and you could not introduce entirely new CI tools for new prototypes over to a polyrepo with separated semver library repos for shared dependencies, and it simplified everything so much.

Adding additional PRs across different repos is functionally no different than the same PR with scattered dependencies in a monorepo, except that separating the PRs makes each isolated set of changes more atomic and focused, which has led to fewer bugs and better quality code review and, the hugest win, each repo is free to use whatever CI & deployment tooling it needs, with absolutely no constraints based on whatever CI or deployment tool another chunk of code in some other repo uses.

The last point is not trivial. Lots of people glibly assume you can create monorepo solutions where arbitrary new projects inside the monorepo can be free to use whatever resource provisioning strategy or language or tooling or whatever, but in reality this not true, both because there is implicit bias to rely on the existing tooling (even if it’s not right for the job) and monorepos beget monopolicies where experimentation that violates some monorepo decision can be wholly prevented due to political blockers in the name of the monorepo.

One example that has frustrated me personally is when working on machine learning projects that require complex runtime environments with custom compiled dependencies, GPU settings, etc.

The clear choice for us was to use Docker containers to deliver the built artifacts to the necessary runtime machines, but the whole project was killed when someone from our central IT monorepo tooling team said no. His reasoning was that all the existing model training jobs in our monorepo worked as luigi tasks executed in hadoop.

We tried explaining that our model training was not amenable to a map reduce style calculation, and our plan was for a luigi task to invoke the entrypoint command of the container to initiate a single, non-distributed training process (I have specific expertise in this type of model training, so I know from experience this is an effective solution and that map reduce would not be appropriate).

But it didn’t matter. The monorepo was set up to assume model training compute jobs had to work one way and only one way, and so it set us back months from training a simple model directly relevant to urgent customer product requests.

Had we been able to set this up as a separate repo where there were no global rules over how all compute jobs must be organized, and used our own choice of deployment (containers) with no concern over whatever other projects were using / doing, we could have solved it in a matter of a few days.

In my experience, this type of policy blocker is uniquely common to monorepos, and easily avoided in polyrepo situations. It’s just a whole class of problem that rarely applies in a polyrepo setting, but almost always causes huge issues with monorepo policies and fixed tooling choices that end up being a poor fit for necessary experiments or innovative projects that happen later.

3 comments

> each repo is free to use whatever CI & deployment tooling it needs, with absolutely no constraints based on whatever CI or deployment tool another chunk of code in some other repo uses.

Hear, hear. Let teams choose the processes and tools that work best for them. In previous release engineering positions, I resisted the many attempts to instroduce a single standard workflow for all projects. The support burden of letting a thousand flowers bloom was not great, but the benefit was that devs understood their project and were empoiwered to make changes when the business requirements changed faster than standardized tooling could.

We had a few contracts for standard behaviours, but they were low-overhead: must respond to 'make/make test', have a /status endpoint that 500'd when it was unhealthy, register a port in the service conf repo, etc.

> except that separating the PRs makes each isolated set of changes more atomic and focused

It makes it less atomic if you need simultaneous changes in multiple repositories.

> Had we been able to set this up as a separate repo where there were no global rules over how all compute jobs must be organized, and used our own choice of deployment (containers) with no concern over whatever other projects were using / doing, we could have solved it in a matter of a few days.

I think this was an organisational problem, but I accept the argument that monorepos will provide a seed around which such pathologies can crystallise. But I don't believe it's the only such seed and I don't think it's an inevitable outcome from monorepos.

> It makes it less atomic if you need simultaneous changes in multiple repositories.

No, each individual set of changes is more atomic (smaller in scope, mutating a system from one state of functionality to a new state of functionality).

The problem is that it’s a linguistic fallacy to act like in the monorepo case “the system” is the sum of a bunch of separate systems (it isn’t, because they are not logically required to depend on simultaneously transitioning). So in that monorepo case, to move subcomponent A from some state of functionality to a new state of functionality, you unfortunately have to also make sure you include totally unrelated (from subcomponent A’s point of view) changes that also correctly transition subcomponent B to a new state of functionality, and subcomponent C, etc., which is exactly less atomic (to transition states, you are required to have simultaneous other transitions that are not logically required for any reason other than the superficial sake of the monorepo).

> simultaneous other transitions that are not logically required for any reason other than the superficial sake of the monorepo

I don't see what's superficial about "everything everywhere is in sync", myself.

And I have absolutely seen PR race conditions. Assuming that everyone perfectly sliced up the polyrepo on the first go is optimistic.

> “I don't see what's superficial about "everything everywhere is in sync", myself.”

Well it is superficial by definition, because two unrelated things are “in sync” only because you say so. The very meaning of “in sync” in your sentence is some particular superficial standard you chose that has nothing to do with the logical requirements of the isolated subcomponents (i.e. “in sync” meaning two independent subcomponents were adjusted in the same large commit or PR is, by definition, superficial... it’s just a cosmetic notion of “in sync” you chose for reasons unrelated to any type of requirement).

I work on a polyrepo. The code in repo A has a dependency on the code in repo B. When I update B, I sometimes need to update A.

In a monorepo that's already done when I finish working on the modules in B.

That I am unable to release from A until it has been synced with the module in B is not "a cosmetic notion". It's being unable to release. I consider releasability at all times to be the most important invariant to be sought by the combination of tests, CI and version control.

> “In a monorepo that's already done when I finish working on the modules in B.”

This is not usually true in monorepos or polyrepos, and is quite a dangerous practice that nobody should use and hasn’t got much at all to do with what type of repo you use.

I worked in a monorepo for a long time where you still had to deploy versioned artifacts. So when you makes changes to B, you still have to bump version IDs, pass deployment requirements and upload the new version of B to internal pypi or internal maven or internal artifactory, etc.

Then consumer app A needs to update its version of B, test out that it works and that, from app A’s point of view, it is ready and satisfied to opt-in to B’s new changes, and do build + deploy of its own to deploy A with upgraded B.

Doing this in a way where a successful merge to master (or equivalent notion) of a change for B is suddenly a de facto upgrade for all the consumers of B is insanely bad for so many reasons that I’m not even going to try to list them all. Monorepo or not, nobody should be doing that, that is bonkers, crazy town bad. It’s a similar order of magnitude of bad as naively checking credentials into version control.

This has nothing to do with monorepos though. Its entirely a company policy issue. There's nothing about the monorepo that prevents you from writing a script that ran on precommit and built and deployed via docker to a test cluster.

Unless you mean your presubmit test would push to production machines, that's bad and shouldn't be allowed, but again has nothing to do with a monorepo.

A company could just as easily have draconian policy about testing and deployment and multiple repos. Maybe you could break the rules (hell you could have broken the rules in monorepo land), but again, that's just a rules issue, not an issue of the repository.

If a tool begets using it wrong all the time, then after a certain point, it’s the tool’s fault.

What you’re saying amounts to something of a No True Scotsman fallacy... “no _real_ monorepo would limit different projects from using individualized tooling if needed...” Yet that limitation suspiciously coexists with monorepo tooling frequently, and does not frequently coexist with polyrepo tooling.

>If a tool begets using it wrong all the time

This is the (wrong) assumption. Like I said, there's nothing about a monorepo that "begets" draconian policy. Your anecdotal experience is not a rule. The monorepo I work with doesn't have draconian policies about how tooling must work. There are apis and recommended tools, and if those don't fit your needs (which is unusual), the teams that maintain those tools are willing to support your uses, but if not, you're also free to hack yourself something that works. Writing additional pre-commit hooks is encouraged.

> What you’re saying amounts to something of a No True Scotsman fallacy

Again, no. Certainly monorepos can do this. They're still real monorepos. But polyrepos can too. They're still polyrepos. Its orthogonal.

There absolutely is something about monorepos that begets monopolicies: exactly the very thing that makes them co-occur. It doesn’t matter if it’s sociological or technological, the co-occurrence itself is the thing.

I’d flip it around and say instead that you are assuming the properties by which to compare the two approaches ought to be properties that are roughly like “first principles” and that no first principles difference really exists between them in terms of limiting what you can do.

But this is the wrong way to look at it because, pragmatically, it’s simply just not the sociological phenomenon that actually happens as a side effect in terms of the practical result. Who cares if there’s a first principles reason for them to be different in terms of effectiveness? I certainly don’t— they just are different in terms of effectiveness.

>There absolutely is something about monorepos that begets monopolicies

Correlation (and a weak one at that) is not causation.

I can just as easily suggest that monopolicies beget monorepos, and that indeed makes a lot more sense. Its easier to enforce global standards when there's a single repo. So companies who wish to enforce draconian standards may move in that direction. That says nothing about companies that don't wish to enforce draconian standards though.

If monorepos enable monopolicies (even if monorepos are compatible with usage that doesn’t involve monopolicies), that’s perfectly good reason to avoid them regardless of whether they cause monopolicies.

In “The Beginning of Infinity,” physicist David Deutsch makes a point like this about styles of government. Deutsch suggests the defining characteristic of a good governmental system should not be whether it consistently produces good policies, but instead that there is an extremely low-cost barrier to removing bad policies once it becomes clear they are bad.

Thinking this way, if monorepos permit a situation where there are monopolicies about allowed languages, allowed deployment tooling, etc., and those policies cannot be quickly discarded when it becomes clear they are bad for a certain business goal, then this is perfectly good reason to disfavor monorepos regardless of whether they cause the bad policies.

I think your responses continue to miss the point because you’re talking about correlation and causation as if it matters in a situation like this: but it precisely doesn’t matter.

If a tool doesn’t actively prevent certain policy failure modes (even if it does not cause anyone to choose a bad policy), that is a relative failure of the tool.

Contrasting with polyrepos where it is quite harder to enforce failed monopolicy ideas is one area where polyrepos are a better tool: to misuse polyrepos policy-wise you have to go way out of your way and add a lot of draconian policy enforcement tooling that often can still be circumvented. Those inherent barriers are a good thing that monorepos don’t have.

Separately, I’d also say that the political failure mode where central IT wants to enforce draconian policies is extremely common, and those types of organizations specifically see a monorepo as a tool of (their desired) oppression and control.

Since the base rate of occurrence of horrible companies is super high among all companies, it probably does mean that P(bad | monorepo) is pretty high conditional evidence of a bad workplace culture.