There's breakage outside "brought down". A system that's running but doesn't produce outputs because the input data changed can be "broken" and violating its SLA too. And not really something you can design around outside "we won't promise anything", but then you loose to competitors that do take it on themselves to react quickly enough with hotfixes.
How fast can you grow if you’re constantly putting out fires? It sounds like you are in a B2B world. Businesses always want more. Where are the sales people/customer service managers that can set realistic expectations on what the client requires vs. what they are expected to do? “logging into production and manually correcting stuff” can only go so far and doesn’t scale.