|
|
|
|
|
by chasil
1057 days ago
|
|
> "I have a backup job that is triggered by a timer. I want to know when that job fails so I can investigate and fix it." This is really more in the realm of a shell script. You could do this verbosely: #!/bin/sh
/path/to/my/backup_job
if [ $? -ne 0 ]
then /path/to/my/failure_alert
fi
...or, you could do this tersely: #!/bin/sh
/path/to/my/backup_job || /path/to/my/failure_alert
The wrapper script would go into your timer unit. I like dash. |
|
What happens when when the /path/to/my/failure_alert script fails?
What happens when your backup job returns success but didn't generate any output?
What happens when you turn off the systemd timer for a while and forget to turn it back on?
What happens when the server stops running, has a full disk, or has a networking issue?
Ultimately, some of the other answers are better. You should have a separate system monitoring this. And that separate system should track every time a backup happens, either by checking the backup exists at the target location (good), or checking that the backup system sent a "Yes, I did a backup" message (ok, but not as good).
I use Telegraf for data collection, InfluxDB (v1) as a time series database, and Grafana (v7) for graphing and alerts. I'm using an older version of InfluxDB and Grafana because they just work and keep on working. Many other tools will work just as well as these do. I'm just giving them as an example.
Such a system may seem like overkill to just keep track of a few things, but you need something that'll tell you when you get no data. So at a minimum you'll want something on a separate server and you'll want it to send alerts when an expected event doesn't happen.