| Good gravy, this is a lot to unpack. It's an alarming story from the very beginning, and a cautionary tale of how tempting it is to do everything with Jenkins, even though it's an appropriate tool for absolutely nothing in the Year of our Lord 2021. > As part of our automation setup, we continuously run integrity jobs to inspect our Jenkins nodes. Why on earth would you self-host this in Jenkins? This is a monitoring and alerting problem. > These jobs check system configurations and properties and look to see if any node is failing those checks. What year is it? We've solved this with immutable infrastructure or system integrity monitoring. Or both. > The checks automatically mark Jenkins nodes as offline when any of those checks fail and notifies our Mobile Build & Release team via a Slack message. "Mark" offline? Why not just terminate it? And why do we care if build nodes come and go? These should be cattle, not pets. If they all die at once, that's bad. If they're cycling in and out, that's business as usual. > When our Jenkins UI stopped working, we noticed two things: > 1. We had recently upgraded Jenkins and all its plugins to a newer version Did they just now learn what an awful idea this is? All of this at once, really? This isn't so much a Jenkins problem (though let's be clear, Jenkins is a problem) as it is a remedial engineering problem. The top takeaways should be "choose appropriate tools for the task at hand" and "don't make reckless decisions with brittle systems". |
Given that they are for mobile builds, there might be some macOS nodes in there for iOS builds. These might be in-house machines they maintain -- or, if they use a cloud provider, there might be costs to just killing and spinning up nodes. For example, for EC2 Mac instances:
> EC2 Mac instances are available for purchase as Dedicated Hosts through On Demand and Savings Plans pricing models. Billing for EC2 Mac instances is per second with a 24-hour minimum allocation period to comply with the Apple macOS Software License Agreement.