Hacker News new | ask | show | jobs
by solatic 2049 days ago
> slow CI is probably the biggest engineering time killer in existence

If you're at the size where slow CI negatively affects your projects, then you're big enough to own your own CI (at least the build agents).

Remember that one-man projects don't need CI, and that CI for small (n<5) teams is almost never the bottleneck. These SaaS CI providers really target the open-source / small-team market and it makes sense that they wouldn't optimize for larger-scale operations.

6 comments

I disagree with this.

Even a single project with 20-minute build times is enough to slow down or frustrate development.

At the same time, I would not easily justify spending time managing CI infrastructure with my team of 6-10 people.

Things may have changed since then, but the last time I self-managed build agents, it often lead to build jobs being tightly coupled to the build agent and installed software versions. With a docker-based CI system, you are forced to have everything specified in code, making it much more maintainable.

Additionally, hosted CI allows me to do 100 parallel builds on Linux, MacOS and Windows. Perhaps this is a niche use case, but I saved a lot of time and reduced build times by an order of magnitude by switching from self-hosted CI on Windows and MacOS to a hosted solution.

> At the same time, I would not easily justify spending time managing CI infrastructure with my team of 6-10 people.

The step from a docker-based build to a proper build agent is a small one. From there, running your CI yourself on a cloud provider is not particularly hard and at size will quickly be cheaper than having an intermediary.

If the number of people who are on the team can fit in a single room, then the release process can easily be run on the laptop of one of the team developers. I don't see how any team of five people consistently releases quickly enough that you would have multiple people trying to release at the same time. A small team should not require tooling to enforce that tests are run before production deploys, that should be either cultural, or part of the script that is easy enough to run on anybody's machine.

Most people do not need CI until you have a separation of concerns (i.e. code from credential management) that are managed by different people/teams, and therefore all of the decisions cannot be made in a single room.

I do see being able to run test matrices across multiple OS or device options as a reason for smaller teams to adopt CI early.

One man multi platform projects strongly benefit from CI to test the other platforms.
> If you're at the size where slow CI negatively affects your projects, then you're big enough to own your own CI (at least the build agents).

Strongly disagree. When I joined my current company, as an early engineer on a fairly new product, we had an 8 minute deploy time and I see that as a fairly critical component to our rapid iteration cycle that was critical to the company at that stage.

We had 3 engineers and had no time to spend on owning our own CI.

One man projects do indeed need CI. Test matrices can get large very quickly, and it is far easier to let that be somebody else's problem.

The open source Python library I maintain has over 30 instances in the test matrix of Python version x platform x implementations. I don't even have a Windows dev box handy. TravisCI and Appveyor take care of that for me.

> If you're at the size where slow CI negatively affects your projects, then you're big enough to own your own CI (at least the build agents).

I think you vastly underestimate how much stuff people want to fit into CI and how quickly it turns into a big blob. I work as a freelancer helping not-quite-startups-anymore with things like CI speedup and tuning the database queries emitted by their ORM. You know, things where it's easy to build up technical debt.

It's not uncommon for a team of 3-4 to build so many tests and add so many linters and whatnot that CI takes more than an hour. Often, some basic love can bring it down to ~5 minutes but many teams are so focused on building new features that they will not take time to sharpen their tools.

Would you be willing to share some easy improvements that could be made? How can a large amount of tests run so much faster?
It usually comes down to parallelisation; you want to do a much work simultaneously as you can. Saving CI resources _can_ be reasonable if CI resources are very scarce relative to dev time, as in some open source projects. However if you are paying your devs then the extra few hundred bucks a month for beefier CI is often worth the increase in productivity. A couple of different ways:

- Oftentimes the staging of the CI build can be improved. Devs often set up CI so that linters must pass _before_ actual tests are run. Run them in parallel instead and fail the whole run if the linters don't pass. This is even more important if there are multiple linters (perhaps for different sections of the codebase) and they all get applied serially before any of the tests start.

- Obviously, split up your tests as well so they can run in parallel. If you have a project containing both JS and backend tests, don't wait for one to start on the other. Many "bigger" languages also have something akin to parallel_tests (https://github.com/grosser/parallel_tests) that let you quickly set up multiple databases to separate transactions etc. It also provides tooling to remember the output of previous runs and uses that to equalize the parallel tracks of subsequent runs as much as possible.

- Cache as much as possible. This is a wider topic, but dependencies, docker layers and static assets can all be cached and correctly using this alone can hugely cut CI time. You don't want to know how many projects I've seen that don't have this set up correctly (or at all).

- Longer running projects can have hundreds of database migrations and applying them all to an empty database can take minutes. Big frameworks like Rails can dump the schema for you in a way that you can load in a second instead. Have a separate job that runs in parallel and applies all the migrations then verifies the output against the schema, all the other jobs load in the schema and use that.

Every project I've worked on at Google and Microsoft has used cloud CI services at least in part - travis, etc. My current team had a custom jenkins instance + agents we maintained and we've phased them out in favor of the cloud. It just scales better and the time our team spent maintaining agents can now be spent on writing code and fixing bugs (we do still have people who do work to integrate with the cloud CI services, but it's considerably less)