Hacker News new | ask | show | jobs
by shykes 108 days ago
In moments like this, it's useful to have a "break glass" mode in your CI tooling: a way to run a production CI pipeline from scratch, when your production CI infrastructure is down. Otherwise, if your CI downtime coincides with other production downtime, you might find yourself with a "bricked" platform. I've seen it happen and it is not fun.

It can be a pain to setup a break-glass, especially if you have a lot of legacy CI cruft to deal with. But it pays off in spades during outages.

I'm biased because we (dagger.io) provide tooling that makes this break-glass setup easier, by decoupling the CI logic from CI infrastructure. But it doesn't matter what tools you use: just make sure you can run a bootstrap CI pipeline from your local machine. You'll thank me later.

5 comments

This is a must when your systems deal with critical workloads. At Fastly, we process a good chunk of the internet's traffic and can't afford to be "down" while waiting for the CI system to recover in the event of a production outage.

We built a CI platform using dagger.io on top of GH Actions, and the "break glass" pattern was not an afterthought; it was a requirement (and one of the main reasons we chose dagger as the underlying foundation of the platform in the first place)

I would really love to hear more about this, but my cursory search didn't find a write up about it.

I did a PoC of Dagger for an integration and delivery workload and loved the local development experience. Being able to define complex pipelines as a series of composable actions in a language which can be type checked was a great experience, and assembling these into unix-style pipelines felt very natural.

I struggled to go beyond this and into an integration environment, though. Dagger's current caching implementation is very much built around there being a single long-lived node and doesn't scale out well, at least without the undocumented experimental OCI caching implementation. Are you able to share any details on how Fastly operates Dagger?

We don't have any public posts around our setup (yet), but I think it's time we do. I'll put some time into it and will revert here to link to it.
Being able to run the exact same pipeline locally and in any CI environment is the most compelling feature of dagger. It frees you from any underlying platform, so you can adapt more easily.
At times like this is when I'm so happy I don't work with deploying to a production environment, but rather we release software that (after extensive qualification), customers can install in their environment on their airgapped networks. Using a USB stick to cross the air gap. If we miss a release by a day or thrre, there is enough slack in the process before it goes to the customer that no one will be any the wiser.

Crazy in 2026, but installable software has some pros still, for both the developer and for the customer. And I would personally love if I could do things that way for more things.

I had that revelation for embedded software. After years of live service hosted software, I released an embedded device. It just runs happily, somewhere, who knows, not me.
100%. We used to design the pipeline a way that is easily reproducible locally, e.g. doesn’t rely on plugins of the CI runtime. Think build.sh shell script, normally invoked by CI runner but just as easy to run locally.
My automation is always an escalation of a run book that has gotten very precise and handles corner cases.

Even if I get the idea of an automation before there’s a run book for it.

I like run scripts. Shell or python scripts that do nothing other than prompt the user with what to do, or which choice to make, and wait for them to hit a key to proceed to the next step. Encode the run book flowchart into an interactive script. Then if a step can be automated, the run book script can directly call that automation. Eventually you may end up with a fully automated script, but even if you don't it can still be a significant help.
Someone gave me that idea about eight years ago and I spent the next several trying to look for a nail for that hammer.

I eventually expanded the one I wrote to include URLs to the right places in Bamboo to do things like disable triggers or start manual deployments. By the time I finished that we were doing 10x as many canary deployments as we had been before, and we’re retiring tech debt way faster because of it. 10/10 would do again.

npm publish will open a web browser for you for passcode entry, and I think I’ll do that next time instead of using cut and paste.

It’s a hard sell. I always get blank looks when I suggest it, and often have to work off book to get us there.

I generally recommend that the break glass solution always be pair programmed.

A while back I think I heard you on a podcast describing these pain points. Experienced them myself; sounded like a compelling solution. I remember Dagger docs being all about AI a year or two ago, and frankly it put me off, but that seems to have gone again. Is your focus back to CI?
Yes, we are re-focused on CI. We heard loud and clear that we should pick a lane: either a runtime for AI agents, or deterministic CI. We pick CI.

Ironically, this makes Dagger even more relevant in the age of coding agents: the bottleneck increasingly is not the ability to generate code, but to reliably test it end-to-end. So the more we all rely on coding agents to produce code, the more we will need a deterministic testing layer we can trust. That's what Dagger aspires to be.

For reference, a few other HN threads where we discussed this:

- https://news.ycombinator.com/item?id=46734553

- https://news.ycombinator.com/item?id=46268265

That's good - I'll reconsider Dagger.

Yes, I agree on your assessment. AI means a higher rate of code changes, so you need more robust and fast CI.