Hacker News new | ask | show | jobs
by xjia 928 days ago
I had a similar experience with ARC (actions-runner-controller).

One of the machines in the fleet failed to sync its clock via NTP. Once a job X got scheduled to it, the runner pod failed authentication due to incorrect clock time, and then the whole ARC system started to behave incorrectly: job X was stuck without runners, until another workflow job Y was created, and then X got run but Y became stuck. There were also other wierd behaviors like this so I eventually rebuilt everything based on VMs and stopped using ARC.

Using VMs also allowed me to support the use of the official runner images [0], which is good for compatibility.

I feel more people would benefit from managed "self-hosted" runners, so I started DimeRun [1] to provide cheaper GHA runners for people who don't have the time/willingness to troubleshoot low-level infra issues.

[0]: https://github.com/actions/runner-images [1]: https://dime.run

2 comments

Exactly what you're are describing is what I explained to my colleagues as "stealing runners" :)

If something fails and you don't have idle runners (hence wasting unnecessary resources), things start to snowball.

The module posted here as a way to avoid that where only runners requested by a job can be used by it (or idle runners if you have those)
It's only really usable for anything that doesn't involve secrets, I'd be very concerned using anything third party in CI, let alone the runner itself. Supply chain attack senses tingling :).
Yes I totally understand the concern. We are actively working on SOC 2 and other compliance stuff to help with this. But honestly I feel the compliance requirements are weaker than what we actually implemented. For example proper secure boot and whole disk encryption (without sacrificing performance) are mandatory in our mindset but these specific things don't get reflected in compliance.

Instead of being a service, I'm also open to sell the software+hardware solution behind it, so you can have it on-prem. Do you think that's something you would consider given the constraints on supply chain security?

We're too small for on-prem services, so not your target market, just shared my 2c as someone who had been burned by self-hosting github runners too many a time.