Managing on-prem hardware may not necessarily be hard, but it can be extremely time consuming. To me, the nice thing about dropping a bunch of hardware in a colo is you get to take a lot of shortcuts and take risks that you cannot buy from the public cloud providers.
I worked for a company and would do it again that did the colo route, and it gave immense cost savings compared to public cloud, taking on risks that you can't do elsewhere. Before they started investing in having folks take care of the infra as a raw startup, it was just some servers and some desktop unmanaged switches. But that gave the company breathing room to survive as the business model probably didn't work without it. But also had a reputation for unreliable service.
I've also built the five nines infra at telcos, and yes you can do it with average engineers, but it's going to be time consuming, slow, and expensive in costs and labor. To allow 26 seconds of unplanned outage a month, you're going to be testing every firmware update for every piece of equipment on an ongoing basis, and practicing every operation and change as best as possible. And you need the scale that you get that 26s by having most outages only impact a subset of your customer base, otherwise you're going to blow that outage budget fast.
Managing on prem is definitely harder because you are benefiting from the economics of scale of all the management problems that you have to pay yourself, and if you don't have scale then you will be significantly overpaying to get the same type of quality, reliability, or responsiveness.
Most people are not paid to manage infra, they are paid to talk to customers, ship features, fix bugs, and other "core business" items; just like most businesses don't build roads, they pay taxes and utilize them because the cost of doing it themselves for their preferred traffic patterns would be much more than they could justify (for now.)
If you don't have scale, you don't need most of the features. Fire up PC, load application. Setup egress port open to internet. Setup application backup on cron job. Done until scale problems arise.
Correct, my point absolutely doesn't apply to someone who is just doing their thing, even maybe 2 orders of magnitude more stuff than their thing.
But when your local IT goon says its going to be 8 months to procure the next set of hard drives for your next order of magnitude, it's a real problem and you have real money to invest in solving it, just not owning a data center money.
"this is what we're paid for" Nope, it's what YOU'RE paid for.
I am paid to relax on my holidays because I know my team and I don't have to drive to a colo to swap out a failing line card since I realized time is worth money and people quit jobs that take up too much of their time. I can A/B test (something on-prem guys NEVER get the luxury to do) so outages just don't happen at all (fingers crossed).
I have rarely met someone happy with their on-prem DC deployments, but after I moved to the AWS world it's just crazy how backwards it is to be anywhere but the cloud.
Anyone serious wouldn't "drive to the colo to swap out a failing line card" they keep have excess capacity and spares in the colo, and have the on-site personnel from the facility replace it.
Honestly just sounds like the environment you describe has greater organizational issues not related to on prem vs cloud.
Compare something like rocketry or chemical engineering with running an on-prem DC. I don't see what the complaining is about. It's still a luxury compared to what other professions have to deal with.
On-prem is massively harder if you can’t cut corners on security or reliability. Just things like testing & upgrading firmware, doing real DR testing (I know multiple places which spent lots of time and money doing annual failover tests, but went down hard every time they had a true failure due to something they’d missed), handling things like boot signing or secure logging, etc. all take up multiple FTEs worth of time, or are a checkbox from a platform which handles that for you.
I worked for a company and would do it again that did the colo route, and it gave immense cost savings compared to public cloud, taking on risks that you can't do elsewhere. Before they started investing in having folks take care of the infra as a raw startup, it was just some servers and some desktop unmanaged switches. But that gave the company breathing room to survive as the business model probably didn't work without it. But also had a reputation for unreliable service.
I've also built the five nines infra at telcos, and yes you can do it with average engineers, but it's going to be time consuming, slow, and expensive in costs and labor. To allow 26 seconds of unplanned outage a month, you're going to be testing every firmware update for every piece of equipment on an ongoing basis, and practicing every operation and change as best as possible. And you need the scale that you get that 26s by having most outages only impact a subset of your customer base, otherwise you're going to blow that outage budget fast.