| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dkjaudyeqooe 913 days ago
	With the last (and only) job that required me to be on call I quit the day before I was scheduled. I've always refused to do it. Devs have no business doing it.

2 comments

tdeck 912 days ago

I appreciate setting boundaries but I don't really understand this attitude. Frequently on call issues are caused by problems with the application logic, therefore solving them requires an understanding of the code. It's not usually my experience that oncall issues are a simple case of force-restarting something or provisioning more boxes, although that can happen from time to time.

link

dkjaudyeqooe 912 days ago

A system that can get itself into a non-functioning state and that can't be supported by an operator or dedicated support person is fundamentally broken and should not be in production. In my view devs should never have access to production, under any circumstances, ever.

This is an artifact of devs (and others) not knowing what they're doing, and just hacking and hoping for the best. It's really not that hard to develop a system that is reliable and supportable in a basic way. Understanding the code shouldn't be a requirement, but understanding the system should, and that's a requirement of support personnel. Put another way, the functional model of the system has to be at a higher level than the code.

link

tdeck 912 days ago

I'd argue that a software system that can be supported by a dedicated operator who isn't a developer is fundamentally broken. Any response protocol that can be handled by someone without familiarity with the system's internal workings can fundamentally be automated. Scaling hardware? Restarting boxes? Ignoring and silencing an alert? Draining traffic to a bad host? These are all fairly simple actions that could theoretically be automated, and have been automated at many companies. There shouldn't be a need for a person who can only do things like this.

On the other hand, there will be production issues caused by a complex interaction within the system that arises in an unforeseen edge case. These issues frequently require a code change which requires the ability to understand the codebase. In that case, the system is broken, but it's not "fundamentally" broken, it's broken for a particular edge case. Unfortunately, we may not have the luxury of waiting until 10 AM PST to start looking into the problem and coming up with a fix for it.

link

Octabrain 912 days ago

> Devs have no business doing it.

Agree but, I have to say that, as a DevOps, it was infuriating to me to have to deal with developers without any care for the quality of what they were delivering. Sometimes for pressure from someone higher in the chain, other times, for pure laziness and/or incompetence. I remember coming in the morning after a hell of a night on the on-call, reporting the issues to the Devs in charge and being answered something along the lines of "fixing that is not the priority right now" and my replying on anger with "If it was your damn phone the one ringing during the whole night I'm pretty sure you would make it a priority".

link

dkjaudyeqooe 912 days ago

There should be some sort of trigger whereby over a certain threshold of problems the devs have to perform the support role. It's unacceptable to deliver a shitty system and rely on support to avert disaster or user revolt, there has to be some sort of incentive to counter this.

link