Hacker News new | ask | show | jobs
by Scaevolus 2619 days ago
There's a nice middle ground between this and a one-at-a-time submit queue: have a speculative batch running on the side. This gives nice speedups (approaching N times more commits, where N is the batch size) with minimal complexity.

One useful metric is the ratio between test time and the number of commits per day. If your tests run in a minute, you can test submissions one at a time and still have a thousand successful commits each day. If your tests take an hour, you can have at most 24 changes per day under a one-at-a-time scheme.

I worked on Kubernetes, where test runs can take more than an hour-- spinning up VMs to test things is expensive! The submit queue tests both the top of the queue and a batch of a few (up to 5) changes that can be merged without a git merge conflict. If either one passes, the changes are merged. Batch tests aren't cancelled if the top of the queue passes, so sometimes you'll merge both the top of the queue AND the batch, since they're compatible.

Here's some recent batches: https://prow.k8s.io/?repo=kubernetes%2Fkubernetes&type=batch

And the code to pick batches: https://github.com/kubernetes/test-infra/blob/0d66b18ea7e8d3...

Merges to the main repo peak at about 45 per day, largely depending on the volume of changes. The important thing is that the queue size remains small: http://velodrome.k8s.io/dashboard/db/monitoring?orgId=1&pane...

2 comments

The paper mentions Zuul as a previous work, but notes that batching has downsides:

> Optimistic execution of changes is another technique being used by production systems (e.g., Zuul [12]). Similar to optimistic concurrency control mechanisms in transactional systems, this approach assumes that every pending change in the system can succeed. Therefore, a pending change starts performing its build steps assuming that all the pending changes that were submitted before it will succeed. If a change fails, then the builds that speculated on the success of the failed change needs to be aborted, and start again with new optimistic speculation. Similar to the previous solutions, this approach does not scale and results in high turnaround time since failure of a change can abort many optimistically executing builds. Moreover, abort rate increases as the probability of conflicting changes increase (Figure 1).

The same thing was (is?) done in openstack with zuul, I believe. When you going to merge something, your branch goes on top of things already going through the CI.
We talked to the Zuul team, they use more parallelism but it's similar: https://zuul-ci.org/docs/zuul/user/gating.html

Most of the complexity and suffering of a submit queue evolves from the interactions between your VCS and CI systems. Keeping things simple is great! Kubernetes' CI system is Prow, which runs the tests as pods in a Kubernetes cluster. Dogfooding like this is great, since the team you're providing CI for can also help fix bugs that arise.

Yes, I recently switched my org to using Zuul for this purpose; by having an internal speculative queue of future states for master you can have multiple pending changes tested at once, while also ensuring that the tested code is exactly what goes into master. So far it's been a really good experience, in particular as our tests take a long time.

It sounds like Uber's thing has a lot more smarts regardint deciding what gets tested. For the scale I work at (<200k lines of code) that isn't necessary.