Speaking as an ops person, my first thought is that you have technical or architecture debt. Obviously, big and/or very rapidly growing systems will hit limits and need constant attention, but these days designing most applications to scale is not a problem.
The root cause of many operations issues that I see these days stems from one or more deficiencies in the development process. I don't say "deficiencies in developers": to get safe development at speed, you need a disciplined development process with appropriate feedback mechanisms: unit tests, integration tests, performance tests, static analysis, code review etc. The default state of code is "buggy", because humans are not perfect.
The root cause of many operations issues that I see these days stems from one or more deficiencies in the development process. I don't say "deficiencies in developers": to get safe development at speed, you need a disciplined development process with appropriate feedback mechanisms: unit tests, integration tests, performance tests, static analysis, code review etc. The default state of code is "buggy", because humans are not perfect.