Hacker News new | ask | show | jobs
by aarondl0 3265 days ago
Excellently designed system. Very poorly implemented.

Our team's been using it since it's initial releases. It's been nothing short of disastrous for all but the smallest pipelines.

The design is great. Keeping configs in yaml instead of little white boxes in a Jenkins database is much better. Pipelines as a first class concept. It feels inspired by a functional programming language. You get great build reproducibility since there are no workers that get dirtier over time if you forget to clean up. The resource model is awesome. Very cool stuff. I'm hoping every CI system learns from what's here. Second to none in design.

However it performs like a dud. No scheduling to speak of, just runs everything as soon as it can. We've run into nodes dying under load (-not- underprovisioned, could run all these jobs manually at once on these monsters). We've run into problems with volume reaping, fork bombs, ui freezes, everything under the sun.

I really like Concourse and will hopefully one day be able to come back to it when its implementation is as solid as its paradigms are.

But I'd avoid use for now.

1 comments

> However it performs like a dud. No scheduling to speak of, just runs everything as soon as it can. We've run into nodes dying under load (-not- underprovisioned, could run all these jobs manually at once on these monsters). We've run into problems with volume reaping, fork bombs, ui freezes, everything under the sun.

I've used concourse as a consumer for 3 years and I've very, very rarely seen any of the problems you're describing, even on the older versions and certainly not in the last year or so.

> no scheduling to speak of

Concourse has a massive scheduling system built into it.. https://github.com/concourse/atc

Furthermore, you can configure jobs to run in serial (default is parallel).

> ui freezes

Put your `web` binary on a decently-sized VM and your problem should disappear. Also, don't have your workers on the same VM as your `web`.

UI freezes are completely client side and related to the elm implementation and your browser's execution of the code. The size of your VM doesn't matter at all.

ATC is a more of a dependency scheduler. The code [1] shows that it basically gets all pending jobs, and then runs them. There's no concept of queueing or maximum number of jobs, you just have to hope your limits are high enough (max containers, max tasks in systemd, max fds in the same) and that your machine doesn't fall over in the attempts.

The "massive" scheduling system also has no idea what nodes need work and which do not [2] so the idea is to heavily overprovision until it doesn't fall over (on top of already beefy requirements which others have alluded to in this thread).

You can not serialize multiple pipelines. Only within pipelines. If I have 10 pipelines, they will all run independently and there's nothing you can do about it other than attempt serialization with the pool resource (which we've recently had problems with - it also appears to be buggy and we're looking at submitting patches).

I've got a tremendous amount of experience with this system and I believe it's everything I made it out to be. The rebuttals you've provided to my issues are simply a lack of understanding of our context, not every user will have the same experience with any given product.

[1] https://github.com/concourse/atc/blob/master/scheduler/sched...

[2] https://github.com/concourse/concourse/issues/675