Hacker News new | ask | show | jobs
by danpozmanter 2160 days ago
There's more going on and worth exploring if engineers are so unattached to outcomes they pick technology in a vacuum. Pager duty is a heavy stick - it's important not to avoid root cause analysis. So if your engineers are making bad choices - what's really going on?

That assumes engineers are even empowered to make technology choices. At many companies they are not (whether by dint of organizational structure or the roadmap not allowing a major technology shift from whatever "shitpile" you and your team have inherited).

Having clear escalation strategies (and knowing when escalation to the original engineers behind a project is even appropriate) is often lacking. I wouldn't want to call engineers in at 3am for a problem that can be fixed by following a documented devops process. Plus - what happens when the engineer you need to reach is unavailable? They are sick, or don't wake up, or their phone died?

What happens when business pressure says "we're ok with calling engineers twice a week as long as the roadmap moves"?

"You built it you're on call" is a fragile way to handle problems in more ways than one.

Which isn't to say there shouldn't be shared responsibility. Of course there should. But responsibility without power is toxic. At the very least it increases flight risk - but in practice often has a far wider reaching deleterious effect than just that.

2 comments

> I wouldn't want to call engineers in at 3am for a problem that can be fixed by following a documented devops process.

Why would there be a process that could be executed that wouldn't already be automated? If the ops guy is dealing with an issue, it's because all the known remediations have failed.

Containers already have auto restart on failed health checks. VMs have vmotion and HA for failed hardware. If the ops guy is up at 3am, dealing with a service you wrote, chances are high that you (or your team) should be involved for the quickest resolution

A documented process can be automated.

Humans should be paged only when this is a new category of failure, and in that case, having the developer wake up first triggers a really good feedback loop.