Hacker News new | ask | show | jobs
by jrockway 2056 days ago
I guess this is the same pricing model as CircleCI?

I've always found "build minutes" to be a little bit of a vendor-favored pricing model. I really love wanting to do a release, and watching my CI provider take three minutes to pull down a 30MB docker image, or "npm install" running at dialup speeds. All while they're billing you per minute -- they make money by not investing in their infrastructure! I'd prefer to pay per byte transferred and CPU instruction executed -- if they make the hardware or network faster, the price stays the same, but they can do more work with their infrastructure. And if you schedule less work, the price for you goes down.

But, it's simply not done, and that's kind of sad because slow CI is probably the biggest engineering time killer in existence. Other than Hacker News ;)

5 comments

For reference, travis CI was sold to an investment company some time ago.

The strategy is 100% to not improve the infrastructure, fire all the developers to save costs, and milk money from existing customers for as long as possible.

Travis bought out https://news.ycombinator.com/item?id=18978251

Staff laid off the next month https://news.ycombinator.com/item?id=19218036

Around that time and for that reason, I migrated all my personal projects to CircleCI. I did the same for my then employer's open source projects, but CircleCI was also a customer, so it was a pretty easy sell.
This is why I strongly prefer BYO-agents systems (github actions, buildkite, jenkins).

I got our CI pipeline from 90 minutes to 10 minutes by running parts in parallel and applying lots more hardware.

The cost is essentially nothing ($0.10 / build) compared to the time developers spend waiting.

I currently use & recommend buildkite because they offer a preconfigured cloudflare template that does all of the hard parts for you.

It's funny, this is almost exactly the argument we used to switch to non-BYO systems – moving from Jenkins to CircleCI.

We had Jenkins, but it required so much engineering time to manage, secure, upgrade, etc. Then when we hit scaling limits as our team grew, scaling it out to multiple machines took more time and cost a significant amount as we had to have capacity for peak time, which when all engineers are in one timezone is a significant peak compared to, say, the weekend.

We moved to CircleCI and while I have many frustrations with it, parallelism and ability to speed up the pipeline with minimal development overhead are not really frustrations I have. The cost is also minimal compared to the developer time, and while we're getting "less for our money", because we only pay for the active time, it's actually cheaper for us than Jenkins was just for hardware rental, let alone developer time managing Jenkins.

I can completely see how a different org with different constraints, different deployments, clouds, strategies, provisioning, distribution of engineers, etc, could come to the conclusion that you did – that BYO is better, but I think it does depend on so much.

I think that's more of a comparison of different pieces of software rather than BYO or not. Jenkins is quite the beast no matter what size your projects. There are other fully self-hosted CI solutions, but Jenkins is the biggest one... the hardest one... usually the most fragile one... and for some reason the most popular one...
Yeah that's definitely a factor. There's part of it that's not related though – scaling out build capacity. Setting up a Jenkins build node is actually quite straightforward and reliable on the Jenkins side, similar to a BuildKite node for example, the issue is where is that node, how does it get provisioned, how is it managed, removed, etc.

For us, it was a bare metal machine where we had to email a sales rep to get a machine added, then spend ~2 hours setting up firewall stuff with semi-manual Ansible scripts. Add to that minimum contract terms and difficulty cleaning machines, and it was a pain to manage.

Conversely, if you've got a reliable autoscaling solution of some sort, and your build manager is capable of poking that as necessary to scale up and down (possible with Jenkins, but hard), then this could be really easy to do and BYO may be feasible.

Having a CI provider give us ~unlimited pay-as-you-go capacity that needed no management on our end and was always a clean environment, that was worth a lot to us in engineering time.

> For us, it was a bare metal machine where we had to email a sales rep to get a machine added, then spend ~2 hours setting up firewall stuff with semi-manual Ansible scripts. Add to that minimum contract terms and difficulty cleaning machines, and it was a pain to manage.

That'll do it. I'm using the Buildkite elastic stack, which took me about 20 minutes to start using and 4-5 hours to dial in to ideal settings (eg adding IAM to allow deploys from agents, getting the right size spot instances etc).

Wait how is github actions a BYO-agent system? I thought you can only run actions on github's infra?
They can run on github's infra, but you can host your own agent as well.

https://docs.github.com/en/free-pro-team@latest/actions/host...

... though github.com is still involved in the round-trip. That is, if your self-hosted agent agent has to run a workflow in response to a push event, the event still has to come from GH's servers, because GH is still doing the job scheduling. The agent doesn't monitor for pushes itself, and the whole communication channel is specific to GH so you can't swap in another provider.
As an example of what the opposite pricing model looks like, YourBase[1] charges a flat fee per build such that it's in their best interest to make builds as fast as possible. Because of this forcing function, builds are instrumented and cached down to the system call level deterministically, based on file changes. It's amazing what economic incentive can do. (disclaimer: I work at YourBase)

[1]: https://yourbase.io

Would be great if you could make your pricing public. Call-me pricing immediately disqualifies a vendor from consideration for me, and I suspect I'm not alone.
That's great feedback! I've shared it with my team, thanks.
> slow CI is probably the biggest engineering time killer in existence

If you're at the size where slow CI negatively affects your projects, then you're big enough to own your own CI (at least the build agents).

Remember that one-man projects don't need CI, and that CI for small (n<5) teams is almost never the bottleneck. These SaaS CI providers really target the open-source / small-team market and it makes sense that they wouldn't optimize for larger-scale operations.

I disagree with this.

Even a single project with 20-minute build times is enough to slow down or frustrate development.

At the same time, I would not easily justify spending time managing CI infrastructure with my team of 6-10 people.

Things may have changed since then, but the last time I self-managed build agents, it often lead to build jobs being tightly coupled to the build agent and installed software versions. With a docker-based CI system, you are forced to have everything specified in code, making it much more maintainable.

Additionally, hosted CI allows me to do 100 parallel builds on Linux, MacOS and Windows. Perhaps this is a niche use case, but I saved a lot of time and reduced build times by an order of magnitude by switching from self-hosted CI on Windows and MacOS to a hosted solution.

> At the same time, I would not easily justify spending time managing CI infrastructure with my team of 6-10 people.

The step from a docker-based build to a proper build agent is a small one. From there, running your CI yourself on a cloud provider is not particularly hard and at size will quickly be cheaper than having an intermediary.

If the number of people who are on the team can fit in a single room, then the release process can easily be run on the laptop of one of the team developers. I don't see how any team of five people consistently releases quickly enough that you would have multiple people trying to release at the same time. A small team should not require tooling to enforce that tests are run before production deploys, that should be either cultural, or part of the script that is easy enough to run on anybody's machine.

Most people do not need CI until you have a separation of concerns (i.e. code from credential management) that are managed by different people/teams, and therefore all of the decisions cannot be made in a single room.

I do see being able to run test matrices across multiple OS or device options as a reason for smaller teams to adopt CI early.

One man multi platform projects strongly benefit from CI to test the other platforms.
> If you're at the size where slow CI negatively affects your projects, then you're big enough to own your own CI (at least the build agents).

Strongly disagree. When I joined my current company, as an early engineer on a fairly new product, we had an 8 minute deploy time and I see that as a fairly critical component to our rapid iteration cycle that was critical to the company at that stage.

We had 3 engineers and had no time to spend on owning our own CI.

One man projects do indeed need CI. Test matrices can get large very quickly, and it is far easier to let that be somebody else's problem.

The open source Python library I maintain has over 30 instances in the test matrix of Python version x platform x implementations. I don't even have a Windows dev box handy. TravisCI and Appveyor take care of that for me.

> If you're at the size where slow CI negatively affects your projects, then you're big enough to own your own CI (at least the build agents).

I think you vastly underestimate how much stuff people want to fit into CI and how quickly it turns into a big blob. I work as a freelancer helping not-quite-startups-anymore with things like CI speedup and tuning the database queries emitted by their ORM. You know, things where it's easy to build up technical debt.

It's not uncommon for a team of 3-4 to build so many tests and add so many linters and whatnot that CI takes more than an hour. Often, some basic love can bring it down to ~5 minutes but many teams are so focused on building new features that they will not take time to sharpen their tools.

Would you be willing to share some easy improvements that could be made? How can a large amount of tests run so much faster?
It usually comes down to parallelisation; you want to do a much work simultaneously as you can. Saving CI resources _can_ be reasonable if CI resources are very scarce relative to dev time, as in some open source projects. However if you are paying your devs then the extra few hundred bucks a month for beefier CI is often worth the increase in productivity. A couple of different ways:

- Oftentimes the staging of the CI build can be improved. Devs often set up CI so that linters must pass _before_ actual tests are run. Run them in parallel instead and fail the whole run if the linters don't pass. This is even more important if there are multiple linters (perhaps for different sections of the codebase) and they all get applied serially before any of the tests start.

- Obviously, split up your tests as well so they can run in parallel. If you have a project containing both JS and backend tests, don't wait for one to start on the other. Many "bigger" languages also have something akin to parallel_tests (https://github.com/grosser/parallel_tests) that let you quickly set up multiple databases to separate transactions etc. It also provides tooling to remember the output of previous runs and uses that to equalize the parallel tracks of subsequent runs as much as possible.

- Cache as much as possible. This is a wider topic, but dependencies, docker layers and static assets can all be cached and correctly using this alone can hugely cut CI time. You don't want to know how many projects I've seen that don't have this set up correctly (or at all).

- Longer running projects can have hundreds of database migrations and applying them all to an empty database can take minutes. Big frameworks like Rails can dump the schema for you in a way that you can load in a second instead. Have a separate job that runs in parallel and applies all the migrations then verifies the output against the schema, all the other jobs load in the schema and use that.

Every project I've worked on at Google and Microsoft has used cloud CI services at least in part - travis, etc. My current team had a custom jenkins instance + agents we maintained and we've phased them out in favor of the cloud. It just scales better and the time our team spent maintaining agents can now be spent on writing code and fixing bugs (we do still have people who do work to integrate with the cloud CI services, but it's considerably less)
While I see the mismatched incentives here, I believe they have enough other incentives to make builds faster. I gladly switched over from the concurrency-based pricing to per-minute pricing on CircleCI when it became available. This ended up being significantly cheaper for me, and I never have to worry about how many builds I'm running in parallel.