Hacker News new | ask | show | jobs
by happysadpanda2 13 days ago
I interpreted it more like "I have these 500 different cronjobs all spread out across $unit_of_time. If the system is down for longer than $unit_of_time and then comes back, does all 500 jobs start running instantly (since they missed their previous deadline)?"
1 comments

Just to be clear, this isn't default systemd timer behaviour, you need to opt in by setting Persistent=true. If you have hundreds of jobs like this you need a proper queue and neither cronie nor systemd is the right tool because at that scale you'd surely need better observability
You could implement this with a gitlab instance in a separate system, like two VMs in proxmox or two physical machines, and a shell executor running in them. Gitlab CI has a nice feature to limit concurrency by using resource groups. Say you have 500 jobs spread through the day and the system stays offline for a while, when it comes online it'll start processing the jobs, but will only run a single one at a time. You get visibility, logs, queue monitoring and an API to query data.
> If you have hundreds of jobs like this you need a proper queue and neither cronie nor systemd is the right tool

Eh sometimes, but you can get pretty far with one of two approaches:

1. Careful use of Requires= and Wants= to group your scripts into chains of jobs, which achieves fixed parallel (though at 100s of jobs, I hope you're generating those unit files with a tool like Puppet or https://github.com/karlicoss/dron or something and not doing this by hand).

2. Even better, just use a lockfile. `ExecStart="flock -F $TMPDIR/mylock <command>"` is pretty hard to beat. Use -F so as not to confuse KillMode and resource accounting and you're golden. Just don't use flock(1) timeouts; let systemd handle that. Heck, if you have that many cron jobs, you should be doing this even if you don't use systemd; otherwise job latency changes can cause reboot-style thundering herds out of the blue.

If you need semaphore behavior and still don't want a real job queue, waitlock (https://github.com/bigattichouse/waitlock) and many other CLIs have you covered.

1. This is spread across 500 files, maintainability goes out the window

2. If this for some reason fails, misconfiguration or unexpected shutdown, you could have a failure that's hard to track or debug

These are fine with a few services chained together, but this requires a shallow depth of dendencies. To have these theoretical hundreds of jobs chained together like this isn't practical or safe.