Hacker News new | ask | show | jobs
by rtkwe 1164 days ago
It's a constant pain of mine to try to get people to stop having business as usual or successfully completed $PROCESS emails come out of our batch processes on our teams at work. They absolutely drown my inbox so I'm forced to filter them then the actual failures get buried in the unchecked "batch spam" folders.
3 comments

I had a boss who had an inbox with literally hundreds of thousands of unread emails. A good chunk of those emails were "success" messages from batch processes.

It's quite correct to send a "success" message when a batch process is completed successfully, but it's quite wrong to send that message to a human. It should be sent to a machine that should translate a missing success message into an error message/alert for humans to respond to.

For example, I have a set of nightly backup jobs. The last step of each backup process is to send a success message to my monitoring system. I only get a "Missing Backup" alert when the monitoring system detects that it didn't receive the success message it expected for a particular backup.

My old boss didn't seem to understand the concept that people don't generally notice missing messages. Or he was too lazy/incompetent to use a monitoring system that could translate gaps in successes into errors.

Even that is utterly unnecessary because we use ControlM for basically all of the batch work in my area that I know of and there's already automation that opens an Incident on a job failure that can flow into the whole on call system! If a job or cycle is critical and needs to finish by a certain time you can setup messages to go out at that time and everything.
My pet peeve is these $PROCESS notifications that go to slack channels. I worked at a company that had an #engineering_humans slack channel because we got chased out of #engineering by bots.
I'm fine if they go to THEIR OWN slack channel. Then I can mute or leave that channel.

Of course, it's a different problem if those notifications have a mix of actionable and non-actionable messages (e.g. both success and error messages). Then it's a signal/noise problem.

The one that pushes buttons is the alarms that have no docs attached so when they blow off at 2AM, they just get muted until someone comes in and complains at 6AM.