Hacker News new | ask | show | jobs
by kaikai 3138 days ago
Chasing bugs and being on-call sound like core parts of a software engineer's job, rather than operational work.

That said, some teams at my company are experimenting with having a week-long rotation for "bread box" issues. Those include tending issues/PRs in open source repos, handling bugs as they come in, etc. That frees up the rest of the rest of the team to work on core feature work.

I like to keep a running list of smaller, non-urgent tasks that would otherwise get neglected. When I have a long-running script or need to take a break from another project, I can refer to the list.

2 comments

Chasing bugs? Yes. Being on-call? No. Not unless you signed up for that. Too many companies think they can just get Pagerduty going and sign up all their engineering staff for operations duty. This is stupid for a number of reasons least of which is managed services get rid of most of the need for this and it is typically cheaper than developer time.

Do some developers on the team need to think about scale? Yes. Should all the developers be on call because perhaps the company decided to roll it's own infrastructure and someone has to deal with occasional server with full disks? No.

The flipside to this is that being on call forces developers to care about bugs in their code that cause operational headaches instead of just throwing releases with varying degrees of test coverage over the fence to ops. Funny how certain bugs that languished in the background get priority when the dev responsible for that code's phone is the one that rings at 3am instead of some poor schmuck on the ops team.
This exactly. If the developers responsible for the problem (and the fix) aren't feeling the pain of being on-call, then nothing will change and the fallout will be left on support/ops (who will usually find a poorly thought out workaround).

Do developers need to be on-call to handle purely ops-related activities (low disk space, high system load, etc)? Absolutely not. Should developers be responsible for their "production-ready" code when it breaks? Definitely.

But the problem is if you assign a rotating duty to your engineering staff, you as an engineer have no direct impact on how often you will be called due to the half-assed work of other developers. It's a rocky road. Do this too much and your staff will leave. I certainly will. Life is too short.

In short, we're all describing poor management issues. Signing up all the developers for Pagerduty is band aid. So is pushing it all onto operations. In both cases, management is making a choice to avoid dealing with something that requires ongoing effort and time.

This works the other direction as well.

Managements risk-taking is essentially guaranteed by free employee overtime.

On call as a core part? Really? Thankfully I've never worked anywhere with such a "duty", tbh if my current place proposed it I'd be applying for new jobs by lunch time.

What's the standard pay for being on-call as a matter of interest?

Every healthy engineering place I've worked at had developers on call. It's called "eating your own dog food". Devs should be responsible for the things they build - it affects the dev culture significantly when your shitty code can wake you up in the middle of the night.
So realistically, is the code going to be fixed at 3 a.m.? Why can't it wait til I'm in the next morning at 8 a.m. for a proper review, triage, priority listing and then fix?

I'm shocked that people would so easily give up their free time really, but to each their own.

How much does it pay extra?

> It's called "eating your own dog food".

No it's not, that's using your own product. Which I do.

> is the code going to be fixed at 3 a.m.?

Yes, if it matters. I can think of dozens of examples. E.g. you provide a payment processor, and you have clients worldwide.

At any significant size, it's going to matter if your service is down for 5 hours. Let's take an extreme example - let's say Google search goes down at 3am PST. Do you think the engineer on call is going to wait 5 hours before "triage, priority listing and then fix"? Are you kidding me?

I don't know what bubble you live in, but in the real world and for many businesses, outages out of hours matter. I'm sure some places they don't (like maybe day trading systems).

If you don't want a call out, build your systems to be resilient to failure, and self-healing.

You've given an extreme example, not worth replying to really.

> I don't know what bubble you live in

The "bubble" that the vast majority of devs live in where on-call isn't a thing?

> If you don't want a call out

I won't agree to one, nor sign a contract where it's listed.

On the flip side - if your workplace is toxic or bad enough that you aren't allowed to fix systemic issues that cause outages, well then I can see your viewpoint. It's not worth being on call if you can't make things better.