Hacker News new | ask | show | jobs
by dkarapetyan 3639 days ago
Here's a problem statement for you. You have ~12k tests that takes > 40 hours to run sequentially. What do you do?

I know how we've solved the problem to provide as much validation as possible before shipping something to production and at pretty high rates of code churn. Whereas what you're suggesting is untenable on a large enough project. That's like saying drink your milk and have a hearty breakfast. Nice platitudes but not actual engineering. Our solution is not unique in fact. Shoppify and other big shops follow exact same practices (https://www.youtube.com/watch?v=zWR477ypEsc). Not because they don't know any better and haven't heard of setting up proper build pipelines using principles from immutable infrastructure but because at large enough scale you need mutability.

Jenkins was just an example. We don't use Jenkins but you do need something that manages workers and their lifecycle. Saying reduce your test runtime to 5 minutes and have better engineers and tools doesn't cut it.

2 comments

Good discussion guys. Please keep going.

Isn't the architecture of your build directly related to both the architecture of your system and your deployment?

If so, why would somebody think that a monolithic app, even one with threading and workers built in, be better than simply engineering your own as you go along? After all, this is supposed to be engineering, right? Not "How to use Jenkins"

I agree that platitudes aren't solutions, but code smells are the kind of thing that lead one to actually take ownership instead of perhaps using the same paradigm only larger, yes?

Apologies if I missed the point, dkarapetyan.

Code smell is a little ill-defined. Given two experienced enough engineers they'll smell different things based on what experiences have led them to that point. The general rough guidelines is I guess "things should be as simple as possible but not simpler" and depending on what sets of requirements you've optimized for it might not smell right to someone who values a different set of requirements.
We suffered with a Jenkins-like solution for a long time before we decided enough was enough and we wanted to use an approach that didn't need as much soul-crushing, CI-specific effort.

If any of our experiences or insights can help others in their own environments, all the better!

I don't know how large a large project is, but our system is pretty large. We build and test for 4 different operating system flavors and way more than that if you incorporate specific versions and distributions. We run end to end user tests against our applications that test functionality across many of these operating systems. We have broken up our tests into functional groups that have parallelism and caching within the groups and the groups themselves run in parallel. In some cases a single developer or build slave has used 40 machines at once to run these tests (this number was only limited by our budget... windows machines are extra expensive on EC2).

In terms of reporting on tests that run in parallel, we built a tool that specializes in exactly that. It collates output from parallel tests, it times out on tests that are hung, it makes sure the build system doesn't kill it if tests are too silent. It also tracks which tests have run against which versions of the codebase in the past and what their outcomes are. We use supporting tools to analyze test flakiness and understand when they are introduced. We have had a lot of success with this approach, as developers debugging weirdness across many tests is less miserable when they can use the same tools that CI does.

Critically, when bugs in those tools are discovered, developers can pinpoint and fix those bugs locally with reasonable ease. Deploying fixes to the test runner (or the logic that allocates workers for the test runner) is like any other change. No need to tinker with Jenkins (or buildbot, etc) config. No need to take the build system down to test that the change is correct. No need to bring up a test version of the build system and experiment with your change there.

We've gone to great lengths to make our system something that's a joy to work with and helped us be very productive across the many different environments we need to operate in.

It's tough to know how much detail is appropriate in comment threads like these. You're absolutely right that there's a lot that needs to come together to make something like what I've described work. I know because we pulled enough of it together to support our own large and heterogeneous projects.

It sounds like you have also thought about this problem a lot. Can you share more about the sorts of tests (language, test library, etc) you have? Perhaps we can break new ground where each of our respective experiences and intuition intersect.

We have a similar design but need to juggle js and python in some interesting ways. There are only so many variations on the theme of CI so I'm not surprised about the convergent design. Our environment is not as heterogenous and we leverage pre-baked AMIs and LXC containers for isolation and reproducibility.

My contention was the emphasis on local reproducibility. In the past I would have said yes, local reproducibility should be a feature of any well designed CI pipeline but nowadays I'm not sure anymore.

Local development environments are optimized for iteration speed at the cost of reproducibility and stability. Whether this is the right decision or not can be debated. CI environment on the other hand is designed for reproducibility and stability. Those sets of requirements are somewhat at odds and you can't optimize for all at the same time. Tools should be shared across local and CI environments as much as possible but not when it comes at the cost of compromising the requirements for each environment.