| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 7e 2620 days ago
	Is this novel? Other companies have had this for ages.

3 comments

ricardobeat 2620 days ago

No, they haven't. This is a system to queue commits, not a simple CI setup. This problem only comes up when you start having contention due to commit volume in a monorepo (think thousands commits/day). This is only the 3rd one I've heard about.

> This paper introduces a change management system called SubmitQueue that is responsible for continuous integration of changes into the mainline at scale while always keeping the mainline green. Based on all possible outcomes of pending changes, SubmitQueue constructs, and continuously updates a speculation graph that uses a probabilistic model, powered by logistic regression. The speculation graph allows SubmitQueue to select builds that are most likely to succeed, and speculatively execute them in parallel. Our system also uses a scalable conflict analyzer that constructs a conflict graph among pending changes. The conflict graph is then used to (1) trim the speculation space to further improve the likelihood of using remaining speculations, and (2) determine independent changes that can commit in parallel

link

roshanj 2620 days ago

The problem also occurs if your CI Build + Test steps take a while to run, even on a small team pushing dozens of commits per day.

Two code-conflict-free changes may pass a pre-merge build+test cycle independently but may logically break one another if both changes are merged into master. Using a submit/merge queue guarantees that each change has passed tests with the exact ordering of commits it would be merged onto. The example described here is a better explanation: https://github.com/bors-ng/bors-ng#but-dont-githubs-protecte...

link

zyang 2620 days ago

I don't quite understand the problem they are trying to solve. Is there so many change sets that they couldn't provision enough ci servers, hence the "speculation graph with probabilistic model"?

link

joshuamorton 2620 days ago

Sort of, though not really.

Imagine I have three changes, C1 modifies F1, C2 modifies F2, and C3 modifies F1. There's no relation between F1 and F2.

At low-ish rate of submission, you test and commit C1, then test and commit C2, then when you try and test and commit C3, you rebase, and re-test and commit. (the merge doesn't conflict so can be automatically fixed)

Now assume all three changes are submitted by 3 different engineers in the span of a minute and engineers don't want to manually rebase. The rebase/build/submit time is less than the time between changes!

So you have a tool that queues up the changes, and at each change you

1. Rebase onto current head

2. build with the new changes

3. Submit

But that's still really slow. Since everything is sequential. If my change takes ~30m to test, it blocks everyone else who depends on my change.

So OK, do things in parallel: Build and test C1, C1 + C2, and C1+C2+C3. Then, as soon as C1 is finished testing, you can submit all 3. There's still 2 problems though: C2 is unreasonably delayed, and "what if C1 is broken".

So, if C2 and C1 don't conflict, you can actually just submit C2 before C1 even though the request to submit was made after. But when there really is a dependency, like C3 and C1, the question is, do I build and test {C1, C1+C3}, {C3, C3+C1}, or something else. SubmitQueue appears to try and address that question. "Given potentially conflicting changes (not at a source level but at a transitive closure level), how do I order them so that the most changes succeed the fastest, assuming some changes can fail, and I have enough processing power to run some, but not all, permutations of changes in parallel"?

link

jrochkind1 2620 days ago

awesome explanation

link

yzmtf2008 2620 days ago

Say you change test A, and I change test B. Before, they both expect a value VAL to be 1, but now test A expects it to be 2 and test B expects it to be 3. We both submit a change, and both changes passed CI on that respective branch. You merged your change into master since it looks OK, and I do too. Now master is broken. Womp womp.

link

foobiekr 2619 days ago

I think they are more common than you are thinking. I am familiar with several, even going back to the svn and cvs era, all of which predated the whole formalized-and-named CI/CD thing. In my experience, we called this model submit-to-commit, and depending on the specific manifestation, worked with diffs or branches. I'm talking 1990s.

The fancy bits in this implementation from the paper are interesting but the model itself is not that unusual.

link

UweSchmidt 2619 days ago

In companies like that, is there any consideration given to minimizing conflict-prone actions, like, say renaming functions, an activity that could conflict with any commit that uses the old function name, but which in itself is unlikely to break anything? Maybe certain commits could be scheduled over the weekend?

I guess I just have a hard time imagining how many buys developers really commit important work all at once on large projects...

link

ungzd 2620 days ago

Other companies often just wait for tests to finish, while at the time of running tests proposed changes (branch/PR) might be not based on current version of master. Then they just rebase/merge after tests pass, without running tests again. For smaller projects, this rarely breaks. For monorepo with lots of committers rate of breakage becomes too large.

Next step is to serialize all proposed changes, so they are rebased one on top of other before running tests. This eliminates breakage due to merging, but does not scale:

> The simplest solution to keep the mainline green is to enqueue every change that gets submitted to the system. A change at the head of the queue gets committed into the mainline if its build steps succeed. > > This approach does not scale as the number of changes grows. For instance, with a thousand changes per day, where each change takes 30 minutes to pass all build steps, the turnaround time of the last enqueued change will be over 20 days.

This paper is about scaling a variant of such queue.

Which companies?

Us, for instance.[0,2]

But sure enough, we definitely weren't the first to go down this path. Facebook was using (or developing the tech for) server-side rebasing in 2015.[1] Gitlab provides native server-side rebase functionality, likely inspired by various parties already having developed tools to do the same.

These aren't new ideas. But handling them at the scale where you land hundreds or even thousands of commits a day to a repo and require the ability to deploy at will, that's where engineering comes into play.

0: https://smarketshq.com/marge-bot-for-gitlab-keeps-master-alw...

1: https://softwareengineering.stackexchange.com/questions/2787...

2: https://github.com/smarkets/marge-bot

link

lozenge 2619 days ago

Yours seems identical to Bors, just for Gitlab instead of GitHub? That isn't really what's described in the OP.

link

bostik 2619 days ago

Yup, pretty much. I was mostly answering the parent, who in turn was questioning the lack of novelty.

The concept of an evergreen master with testing done in branches, followed by automated merges/rebases is not special. Quite a few companies have been doing it for years, it's the off-the-shelf tooling and subsequent publicity that haven't necessarily been around as long.

As for OP's material? The automated conflict resolution via reordering to optimise parallelism - that certainly feels novel.

link