Hacker News new | ask | show | jobs
by adamb 3629 days ago
I don't know how large a large project is, but our system is pretty large. We build and test for 4 different operating system flavors and way more than that if you incorporate specific versions and distributions. We run end to end user tests against our applications that test functionality across many of these operating systems. We have broken up our tests into functional groups that have parallelism and caching within the groups and the groups themselves run in parallel. In some cases a single developer or build slave has used 40 machines at once to run these tests (this number was only limited by our budget... windows machines are extra expensive on EC2).

In terms of reporting on tests that run in parallel, we built a tool that specializes in exactly that. It collates output from parallel tests, it times out on tests that are hung, it makes sure the build system doesn't kill it if tests are too silent. It also tracks which tests have run against which versions of the codebase in the past and what their outcomes are. We use supporting tools to analyze test flakiness and understand when they are introduced. We have had a lot of success with this approach, as developers debugging weirdness across many tests is less miserable when they can use the same tools that CI does.

Critically, when bugs in those tools are discovered, developers can pinpoint and fix those bugs locally with reasonable ease. Deploying fixes to the test runner (or the logic that allocates workers for the test runner) is like any other change. No need to tinker with Jenkins (or buildbot, etc) config. No need to take the build system down to test that the change is correct. No need to bring up a test version of the build system and experiment with your change there.

We've gone to great lengths to make our system something that's a joy to work with and helped us be very productive across the many different environments we need to operate in.

It's tough to know how much detail is appropriate in comment threads like these. You're absolutely right that there's a lot that needs to come together to make something like what I've described work. I know because we pulled enough of it together to support our own large and heterogeneous projects.

It sounds like you have also thought about this problem a lot. Can you share more about the sorts of tests (language, test library, etc) you have? Perhaps we can break new ground where each of our respective experiences and intuition intersect.

1 comments

We have a similar design but need to juggle js and python in some interesting ways. There are only so many variations on the theme of CI so I'm not surprised about the convergent design. Our environment is not as heterogenous and we leverage pre-baked AMIs and LXC containers for isolation and reproducibility.

My contention was the emphasis on local reproducibility. In the past I would have said yes, local reproducibility should be a feature of any well designed CI pipeline but nowadays I'm not sure anymore.

Local development environments are optimized for iteration speed at the cost of reproducibility and stability. Whether this is the right decision or not can be debated. CI environment on the other hand is designed for reproducibility and stability. Those sets of requirements are somewhat at odds and you can't optimize for all at the same time. Tools should be shared across local and CI environments as much as possible but not when it comes at the cost of compromising the requirements for each environment.