| Overview: On call one week in six; pay £300/week for being available (extra for national holidays), and time-and-a-half for any time spent dealing with calls, rounded up to the next 15 minutes. Expected duties: Respond to major problems outside of office hours, in response to a phone call. Be online fixing the problem within 15 minutes of being called. Typical workload 2 hours per week. Be somewhere with a reliable internet connection, and where you can stay even if an issue takes a few hours to resolve (no mobile internet from campsites). Be sober. Fixes: Because of the need for code review and second testing, emergency code releases never happen out of hours. Instead, all code deployments include a backout plan, which reverts to a known-good version of the code (not a bug-free version, there's no such thing, but a version that hasn't caused major problems when run for several weeks). Data in the database may need to be manually fixed. If you're patching an issue by deleting some bad data, and in your judgement deleting the data might delete evidence needed to identify the root cause of the problem, try to identify the root cause if time allows. Priority: On call is only called for major problems - either their software has raised an alert saying they should be called, or there is a problem costing thousands of pounds a minute such as "website down", "customers cannot place orders" or "customers will not receive orders". There isn't a formal SLA, but as it costs thousands of pounds a minute the expectation is to fix as fast as you safely can, without mistakes that make the problem worse. Escalation: To their team leader, or to some other senior software engineer on their team (our team leaders have all been senior software engineers on the team they lead) APIs down: In a rather old-fashioned design move, all critical components are maintained in house. Contact the out-of-hours on call for whatever team maintains the broken system. If it's a non-critical component. |