If you work a job where having one of these incidents once a year or more is "normal" then the dev team needs to devote most of its time to fixing that, or you need to change employers.
What I mean to imply is that it is an issue that is naturally fixed by improved development, and that fixing does require development skill, but the organization can hamstring their developers to prevent them fixing the issue even if they could.
an incident only once a year is an absurd bar. I'm no fan of on call but ensuring that level of incident avoidance would force the company to move at glacial speeds, which is even worse over the long term than getting paged.
I think my sweet spot is somewhere between once a week and once a month, spread across the whole team.
an incident that requires immediate developer intervention, rather than waiting until tomorrow? It seems like you would have to go out of your way to create a system so fragile that this happened once a month
I worked at a telco that served a few tens of thousands of customers in a huge remote region.
There are so many systems held together with baling wire it was rare to go a day without a significant outage, usually multiple. Everyone who was remotely knowledgeable about tech was basically a firefighter.
I don't think this takes into account the reality of huge megacorps with tons of development teams situated globally who are constantly changing the codebase.
Incidents happen as code changes. Even once you fix it, the changing nature of the code can introduce more issues
I've never worked at a megacorp, but if megacorp employees believe that it is more acceptable for them to cause issues for customers than a 3-dev company, that really seems like a skill issue for the megacorp.
If it is unacceptable to cause that downtime, you write code that makes the downtime much less likely
I expect the scale here is not apples to apples. A three person team is often on a small product and downtime is often a catastrophe like truly broken for customers. Meanwhile a megacorp is often many many large products and downtime usually means a piece of one of them is degraded.
My random guess is that the "downtime" is fairly proportional to the scale difference with megas probably taking the edge.
Some (many?) employers make this difficult, and you should try to leave them.