|
We didn't have a merge queue at Google. You rebased if there was a merge conflict, ran through CI again, and hoped there wasn't another merge conflict. I think I ran into merge conflicts maybe once a year, if that. I think the success of this system breaks down into several parts: 1) Yup, microservices. You could submit your proto change, which would affect all clients, before actually implementing the code that used the new feature. (Or after, in the case of renaming some field from foo to deprecated_foo and refactoring the clients to stop using that field.) That means you could wrangle that change without having to worry about it affecting your actual feature. (Typically proto changes did not cause any breakages since people were very conservative about what changes they would make. Nobody renames all the fields, invalidating dependent code, or renumbers the fields, invalidating all existing messages. You COULD do those things, but nobody ever did.) 2) Clear dependencies in the build system. The CI system only had to run a small set of tests for most changes, because it knew exactly what tests the change would affect. You had to go way out of your way to depend on code without informing the build system. This is very different from every CI system that I've seen outside of Google, which seem to default to running everything and hoping your programming language or build system magically tracks dependencies. It doesn't; Docker for example will happily use random images that it thinks haven't changed, without actually checking if it has changed. (Consider building your app on top of golang:latest. Go is updated, and docker may or may not pull that new base image. Meanwhile, docker will happily clear its build cache if you edit README.md and no code. The result is that 50% of the time you waste 10 minutes rebuilding stuff that didn't change, and 50% of the time you get an outdated build. And nobody seems to care at all!) 3) Being careful about keeping changes small. I don't know what the average CL size is, but I would aim for 100 lines changed rather than 1000 lines changed. This is something that surprised me post-Google, people go away and work for a week and you have a 2000 line PR to review. These are tough to merge and were relatively rare in my experience at Google. It is not always possible to make every change small, but that should be the norm. Figure out how much work you can do in a day, and try to make a CL/PR that is that size. A lot can churn in a week. A lot less churns in a day. If you respected steps 1 and 2, that means your tests will run fast and it's unlikely that your merge will fail between CI and actually merging. If you have 2000 lines of code across 8 services... you'll probably never get it merged. But I am sure that I have successfully merged ginormous changes before, it's just more work. All in all, my takeaway from this article is that Shopify is huge but I'm surprised that specialized merge tooling was necessary. I wonder what the underlying problem is; do they really have a 1000 developer monolith? Do they not use a proper build system like Bazel? |