|
|
|
|
|
by fweespeech
3807 days ago
|
|
1) We don't keep a formal on call but 3 of us are tied to an automated alert system and whoever has a chance to take care of it, does. We are all full stack devs so generally we can fix it at the time. If its complicated, we can help it hobble and fix it later. 2) We get 3-4 alerts a year that have to be handled before the next business day. 3) As such, there is no real work priority, triage, etc. You resolve it immediately. There is no other priority. [ Any on-call event == lost money ] > * how do you manage for other teams' risk? (ie their api goes down, you can't satisfy your customers) Asynchronous processing. I buffer until their API is back up. I do this for literally dozens of companies from small manufacturers to Amazon. There really isn't any other good way to handle it and if you need to do it otherwise that really is a fundamental architectural problem that should have been resolved at the design phase. |
|