Hacker News new | ask | show | jobs
by virtue3 634 days ago
On call engineers fixing on call bugs is one of the simplest and most straightforward way out of the hole.

You then also have a direct cost of being “on call” accounted for and on the sprint board.

1 comments

"on call" shouldn't be an additional shift to have the employee at their desk. It's an emergency service with a defined SLA (acknowledge pager within X time, review issue and triage or escalate within Y time. Work on issue until service is restored/bug is rolled back (but not necessarily to the point of completing a long term fix)
This depends. There are several on-call paradigms.

In 2 of the 3 companies I've worked that have on-call, the On Call rotation has been a "the totality of your duties are being on call for [X] duration". There are no features to push, there is Op X and tickets of varying priority levels.

I've always seen it as a 'mode of operation' for a time period. Same schedule/timing unless something bad happens. Then you're the one to be woken up/disturbed. Outside of that... you're generally free to whatever maintenance, process, or feature work.

This is helpful when the incidents are less 'something to revert'... and more something to do or completely remove. If CICD relies on things on the internet for example, deploying caches to remove a laundry list of potential snags.

On call is a bit bipolar as a result. Either comfortably wandering around looking for something worth working on, or knowing what it is - dashing to put out flames! It's not sustainable so we all take turns.

I believe a poster above was correct with their intuition. I feel there's a broken/missing feedback loop. Regular incidents happen, but they shouldn't be constant. The goal should be to eradicate them, accepting a downward trend