Hacker News new | ask | show | jobs
by latchkey 1694 days ago
I run large datacenters with thousands of boxes.

I have a little app (written in golang) installed on each box that effectively is a task runner. Tasks can be written to do anything, including apt-get installing software.

If apt-get fails to run, the task fails (context.WithTimeout) and is run again at a later date. No random hacks needed. Everything is built to be idempotent, self-healing and eventually consistent.

3 comments

Might be a dumb question, but would systemd not be able to do this? Do your boxes not run systemd?
Not a dumb question. My process runs from systemd. But the process itself needs to do all sorts of custom stuff, which is baked into it.
Would you be interested in sharing this tool with the world?
Thanks for asking. This is a huge amount of IP for my business. It is also very custom for our use case. I'm not trying to create a general purpose DSL or anything.
cfengine3 does that (and much more) form me.
Great! Happy it fulfills your use case.

I can assure you that my use case is a bit more involved than what cfengine can provide.

Cheers.

cron does that for me, and it's already installed
Cron is not a very good task runner on its own if you have lots of hosts, care that your tasks actually run, care that they succeed, need sequencing or need to ensure that multiple instances don't conflict.

cron is fine as part of a task scheduler, but even for very basic use cases you'll hit its limitations and will have to work around them.

Yea but, how can you just keep using the same tools for 30+ years? Won't someone think of the developers!?

/s