Hacker News new | ask | show | jobs
by teach 1221 days ago
Am I a curmudgeon? Not to take away from this cool writeup, but I'm familiar with a few CI/CD tools, particularly QuickBuild, Jenkins and Spinnaker. So this jumped out at me:

> Our CI process was pretty standard: Every commit in an MR triggered a GitLab Pipeline, which consisted of several jobs.

me: nodding silently

> Those jobs would run in an auto-scaling Kubernetes cluster with up to 21 nodes

me: what the actual deuce?

Is this really "pretty standard"?

7 comments

It's not "pretty standard", but we're working towards it and it looks like a pretty great solution. Our problem is that CI job runners sleep most of the day (low number of commits), but then you have spikes where the jobs are waiting on each other and times get really long. Autoscaling sounds great - you can have lots of runners when you need them and only a single one (or maybe even none? not sure yet) otherwise.
Only if your company is "Cloud Native" and thus real concerned about paying for over provisioned compute.

Gitlab makes it pretty easy to just toss a ci runner process on a vm or a physical box. You can get real far with a couple rack servers and some xeons for < $1000. You do have to over provision if your work load is not very consistent ( and of course pay for the power and rack space, and someone to mind them from time to time).

If you have it, it’s awesome. You can get parallel execution of so much, spin up environments for each branch for QA and dynamic scans.

IMO it’s the optimal use case for K8s

You have to be at a certain scale for k8s to make sense in a CI environment. In particular, it needs to be economical to spend 10-50% of a full time employee to maintain the Kubernetes cluster (even if it is some managed thing like EKS).

Also, the duty cycle on the 21 nodes needs to be low enough to justify the complexity over just buying 21 computers (or getting annual pricing on 21 VMs). You could use spot instances for the EKS nodes, but then PRs will randomly fail because their instances disappear. That wastes developer salary money and productivity.

Assuming you have a ventilated room you don't care about, you could run 21 desktop towers off of ~ two-four 120V circuits. (Or buy a rack and pay ~ 2x as much for the hardware.) 21 build hosts would cost ~$21-42K. Power is probably averaging 50W per machine (they are probably mostly idle even when running tests, since they have to download stuff.) That's about 720KWh per month. US average electrical pricing is $0.20 / kWh; punitive California rates are about $0.40. So, in the punitive case, that's $288 / month.

Running 21 machines probably requires as much annoying maintenance work as EKS, though the maintenance includes swapping bad hardware, fiddling with ethernet cables, and wearing ear protection (if a rack is involved) instead of debugging piles of yaml and AWS roles, optimizing to stay in budget, etc, etc.

I actually find that K8s in a CI environment is a better use case than in production environments.

In production, you're going to have clearly defined deployment rules, traffic patterns, scalability approaches where the code for each service probably belongs on it's own VM rather than sharing cluster resources.

In non-production environment, you can feel much more free to overload what's deployed on a node because it's not seeing production traffic. You could have a single k8s instance with 30 different environments (each with their own web, worker, databases, redis, etc) for 30 different branches that represent the issues moving through the pipeline from 5 developers. In prod, ever piece of that would be better represented by it's own VM.

If you've only got 21 worker machines that probably works out, but if you've got 210, or 2,100 of them to spin up/down, I'd rather be dealing with yaml config (even though I hate yaml config) rather than get PXE booting working for an on-prem cluster.
Using kube for that is pretty fancy if you aren't already using kube elsewhere, but you don't just have a single Jenkins worker, you have multiple. All that kube is doing is giving a very convenient lever for autoscaling, but other platforms give you this lever as well. If you're not scaling Jenkins workers (or whatever) to match demand, even manually (spin workers down on weekends), you're wasting developer time, compute resources, or both.

Someone's got a new project for Q2 if they aren't doing this already - it's a pretty easy sell if you calculate out the time savings for developers during busy time of day + savings on spinning down compute resources in the middle of the night/weekends, and being able to put "I saved the company $X in idle compute and saved developers Y hours per day" on your yearly performance review looks pretty good.

Yeah... I don't know. We don't, but we have talked about it though, because the Azure pipelines are. just. so. slow. On the other hand, more complexity and Rube Goldberg-machinery is not something we long for.

I have started tinkering with Fastbuild, and preliminary testing makes it seem like to good to be true, or the best thing since sliced bread. I'm sure there are drawbacks somewhere, but it's really fast.

Then again, a big chunk of our pipelines is not actually the compilation, but stuff like downloading nuget packages, uploading artifacts and stuff, all of which are. very. very. slow.

Thanks for this comment. I guess there's sometimes we (developers) take things for granted when they are not, and that puts a lot of pressure on us instead of celebrating our wins.

I would change now "pretty standard" by "we don't invented the wheel" xD :pray:, in the end I wanted to mean we use existing tools and "just" put them together

Yes. In fact, it's standard enough that it's a little odd that they specify "autoscaling" and "21 nodes", when they could have simply said "we use the kubernetes executor".

Even if you are using SaaS GitLab, there are still good reasons to have custom runners, and kube is one option for running them.