Hacker News new | ask | show | jobs
by ricardobeat 2621 days ago
No, they haven't. This is a system to queue commits, not a simple CI setup. This problem only comes up when you start having contention due to commit volume in a monorepo (think thousands commits/day). This is only the 3rd one I've heard about.

> This paper introduces a change management system called SubmitQueue that is responsible for continuous integration of changes into the mainline at scale while always keeping the mainline green. Based on all possible outcomes of pending changes, SubmitQueue constructs, and continuously updates a speculation graph that uses a probabilistic model, powered by logistic regression. The speculation graph allows SubmitQueue to select builds that are most likely to succeed, and speculatively execute them in parallel. Our system also uses a scalable conflict analyzer that constructs a conflict graph among pending changes. The conflict graph is then used to (1) trim the speculation space to further improve the likelihood of using remaining speculations, and (2) determine independent changes that can commit in parallel

4 comments

The problem also occurs if your CI Build + Test steps take a while to run, even on a small team pushing dozens of commits per day.

Two code-conflict-free changes may pass a pre-merge build+test cycle independently but may logically break one another if both changes are merged into master. Using a submit/merge queue guarantees that each change has passed tests with the exact ordering of commits it would be merged onto. The example described here is a better explanation: https://github.com/bors-ng/bors-ng#but-dont-githubs-protecte...

I don't quite understand the problem they are trying to solve. Is there so many change sets that they couldn't provision enough ci servers, hence the "speculation graph with probabilistic model"?
Sort of, though not really.

Imagine I have three changes, C1 modifies F1, C2 modifies F2, and C3 modifies F1. There's no relation between F1 and F2.

At low-ish rate of submission, you test and commit C1, then test and commit C2, then when you try and test and commit C3, you rebase, and re-test and commit. (the merge doesn't conflict so can be automatically fixed)

Now assume all three changes are submitted by 3 different engineers in the span of a minute and engineers don't want to manually rebase. The rebase/build/submit time is less than the time between changes!

So you have a tool that queues up the changes, and at each change you

1. Rebase onto current head

2. build with the new changes

3. Submit

But that's still really slow. Since everything is sequential. If my change takes ~30m to test, it blocks everyone else who depends on my change.

So OK, do things in parallel: Build and test C1, C1 + C2, and C1+C2+C3. Then, as soon as C1 is finished testing, you can submit all 3. There's still 2 problems though: C2 is unreasonably delayed, and "what if C1 is broken".

So, if C2 and C1 don't conflict, you can actually just submit C2 before C1 even though the request to submit was made after. But when there really is a dependency, like C3 and C1, the question is, do I build and test {C1, C1+C3}, {C3, C3+C1}, or something else. SubmitQueue appears to try and address that question. "Given potentially conflicting changes (not at a source level but at a transitive closure level), how do I order them so that the most changes succeed the fastest, assuming some changes can fail, and I have enough processing power to run some, but not all, permutations of changes in parallel"?

awesome explanation
Say you change test A, and I change test B. Before, they both expect a value VAL to be 1, but now test A expects it to be 2 and test B expects it to be 3. We both submit a change, and both changes passed CI on that respective branch. You merged your change into master since it looks OK, and I do too. Now master is broken. Womp womp.
I think they are more common than you are thinking. I am familiar with several, even going back to the svn and cvs era, all of which predated the whole formalized-and-named CI/CD thing. In my experience, we called this model submit-to-commit, and depending on the specific manifestation, worked with diffs or branches. I'm talking 1990s.

The fancy bits in this implementation from the paper are interesting but the model itself is not that unusual.

In companies like that, is there any consideration given to minimizing conflict-prone actions, like, say renaming functions, an activity that could conflict with any commit that uses the old function name, but which in itself is unlikely to break anything? Maybe certain commits could be scheduled over the weekend?

I guess I just have a hard time imagining how many buys developers really commit important work all at once on large projects...