Hacker News new | ask | show | jobs
by pastor_elm 2620 days ago
Sounds so much simpler outside the context of a 'research' paper:

>When an engineer attempts to land their commit, it gets enqueued on the Submit Queue. This system takes one commit at a time, rebases it against master, builds the code and runs the unit tests. If nothing breaks, it then gets merged into master. With Submit Queue in place, our master success rate jumped to 99%.

https://eng.uber.com/ios-monorepo/

4 comments

I can guarantee you that the system that's described in the paper is what we use at production. The blog post that you are pointing to was meant to describe the usage of monorepo at Uber and the challenges we faced at a high level. It didn't dive deep into the submission system and we have the paper to address that :-).

(I'm one of the authors as well as the tech-lead of the system.)

The paper is fantastic and the system sounds brilliant. Thanks for writing it and sharing your experiences. Don't let HN's characteristic middlebrow dismissals get you down.
That's because the research-y part is "how do we pick which commit to enqueue next", and that's harder to answer succinctly.
You can get most of the benefit on smaller scales by building feature branches and ensuring they pass unit tests, deployment and integration testing before they're allowed to be merged to mainline.

It still depends on well written tests, lest your confidence be dashed when a human starts pushing buttons and pulling levers.

Also, don't break up tightly coupled code/modules into separate repos for the sake of microservices. Hard working developers will have to do two or more builds, PRs, possibly update semvers, etc... Find the right seams. If two repos tend to always change in lockstep, think about merging.

This is what we did for a while at a project; there's options in most git hosts nowadays that force any PR to be up to date with master before merging on the one hand, and to have a green pipeline on the other. That works fine for smaller projects, but because it's not automated you end up with quite a bit of manual labor (rebase on master, push, wait for CI, discover someone else merged into master first, repeat).
It isn't just that the tests need to be good and humans break things sometimes. At a certain scale, the following happens enough to be a problem:

- changeset A is submitted, an integration branch is cut from latest master, and CI begins

- changeset B is submitted, an integration branch is cut from latest master, and CI begins

- changeset A's integration branch passes CI build/test, so A is merged into master

- changeset B's integration branch passes CI build/test, so B is merged into master

- however, changeset A + B interact in such a way that causes build and/or tests to fail

- build is now broken

You're probably thinking "that sounds like it wouldn't happen very often. Both changes would need to be submitted within some window such that changeset B's integration branch does not include changeset A, and vice-versa". Which is correct, but that's where the scale comes in. With enough engineers this starts happening more, and the more engineers you have the more unacceptable it is to have the build broken for any amount of time. And the more engineers the more code you have so the longer any individual build starts taking which lengthens the window during which the two conflicting changes could be submitted.

You need to do it in a way that serializes the changes because that's the only way to prevent this, but that takes too long. So the paper is about how to solve this problem.

This is why you rebase and test before each feature branch is to be merged to master. The only issue comes up when someone decides to merge while someone else has already rebased and is running their tests... but when they try to merge to master they will see their branch is out of date and that they need to rebase again. For small teams, it's easy enough to let everyone know that you're merging and not to merge anything else in the meantime. In a larger company, I've seen queue tools that give teams a 'ticket' for their turn to merge into master. It's a little clunky, and probably wouldn't scale to huge engineering teams... but sometimes low tech solutions work just as well.
submit queue makes sense and is used by lots of people, it's the "machine learning" which is applied to choosing commits to enqueue which I found to be interesting. if the master success rate was already 99% in 2017, with just submit queue, why build the complex ML stuff?
As far as I understand, what your describing is a simple build only one commit at a time submit queue.

What they are describing here is to detect if items do not conflict beyond a simple merge conflict and build & commit them simultaneously, increasing the throughput of the submit queue system.

If there are 1000 commits per day (which wouldn't be that many), that's 10 master breaks per day.
And at least in my limited experience, the impact of master being broken is pretty big, and even bigger when you have multiple teams. Either you block master - leading to a lot less than those 1000 commits making it to master on that day - or continue merging in stuff, which causes the root cause of the master branch to become fuzzy - and if your reporting is not in order, that is, if the person who broke the build isn't told they did, it'll be a lot of "Who broke master?" and people looking at each other to find out who is going to look into it. That's not scalable.
That's why master shouldn't be whatever you're about to deploy for the first time, but the known-good version that has been burned-in on prod (for an hour or a week or whatever), so if you have to abandon a release you don't have an entire team who already rebased on top of it.
The way we achieved the master success rate of 100% at scale was by using the techniques that we describe in the paper. The blog doesn't go into details on Submit Queue and how it works.

Just to clarify, the ML models are used to predict the prob. that a given change will succeed against master as well as the prob. of conflict between changes.

The Submit Queue is the name of the complex ML stuff.