|
What you are describing is an org smell[0] I think. On-call should be used to handle urgent, emergent situations that need to be addressed at once in order to keep the business running. What you are describing as the responsibilities of your on-call rotation includes explicitly non-urgent problems: bugs, customer support, reporting. Now these all need to be handled by any competent organization, but they are routine matters of any software system. They should be handled in a routine fashion. For a small company it makes sense for the founders to do all of this, and systems will need to be developed to manage the inevitable overflow of bugs, support requests, and reporting. The fact that this is handled by the on-call engineer in your organization suggests a failure of organizational design: there are "important" tasks like adding new features and "non-important" tasks like fixing bugs (!), communicating with your users (!) and doing root cause analysis of incidents (!). To put things simply, there are jobs in your organization that are not the responsibility of anyone, and thus when they are encountered they go on to the heap of "non-important" things to do. This is unfortunately common in software-making organizations. The problem is that if this heap gets to large it catches on fire. And allocating an engineer to spray water on this flaming trash heap on a reliable schedule is not what most people consider to be a fulfilling task of their employment. So to answer your inquiry, perhaps in addition to giving extraordinary compensation to work which is by definition extraordinary (if it's ordinary work why does it need a special on-call system to handle it?), it is also best to make sure that items which regularly end up on the on-call heap become the responsibility of a person. In an early stage company customer support can be handled by the founder, bugs can be handled as part of sprints, and root cause analysis should be done as the final part of any on-call alert as a matter of good practice. It's my belief, again, that making on-call unreasonably expensive incentivizes the larger organization to create a system that handles bugs, customer support, and reports before they end up on the flaming trash heap. And that long-term this reduces costs, churn, and burnout. I again point to Will Larson because I developed all my thinking on this based on his works.[1] To put it succinctly: Making on-call just another job responsibility incentivizes the creation of an eternal flaming trash heap that a single, poor engineer is responsible for firefighting on a reliable schedule (not fun). Recognizing that on-call is by its nature an extraordinary job responsibility, and compensating engineers in alert in extraordinary fashion, incentivizes the larger organization, i.e. executives, directors and managers, to build systems to minimize, extinguish, and eventually destroy the flaming trash heap (yay). [0] Organization smell, analogous to a "code smell", where a programmer with sufficient intuition can tell something is amiss without being able to precisely describe it immediately. [1] https://lethain.com/doing-it-harder-and-hero-programming/. I recommend buying "An Elegant Puzzle" because some of his best essays on the subject of on-call are only available in the book, not on his blog. |