Hacker News new | ask | show | jobs
by sofixa 888 days ago
> I think a lot of orgs move to cloud simply because it's popular and gartner told them so.

Nah, I think it's mostly about the second part of your comment. Everyone hates waiting for months to get a VM or a database or a firewall rule because the infrastructure/DBA teams are stuck ten years in the past and take pride in their artisanal infrastructure building.

So moving to the cloud eliminates a useless layer of time wasting,

6 comments

If your on-prem team can't spin up a VM same day, then firing them is probably higher ROI than "going to cloud". Further, a lot of the shops "going to cloud" because their infra team is slow.. then hide cloud behind their infra team.

A prior 200+ dev shop went from automated on-prem VM builds happening within hours from when you raise a ticket, to cloud where there was a slack channel to nag&beg for an EC2 which could take a day to a week. This was not a temporary state of affairs either, it was allowed to run like this for 2 years+.

Oh and, worth mentioning, CTO there LOVED him some Gartner.

Despite years of friendly sounding devops philosophy there's times when devs and ops are fundamentally going to be in conflict. it's sort of a proxy war between devs who understandably dislike red tape and management who loves it, with devops caught in the middle and on the hook for both rapid delivery of infrastructure but also some semblance of governance.

An org with actual governance in place really can't deliver infra rapidly, regardless of whether the underlying stuff is cloud or on prem, because whatever form governance takes in practice it tends to be distributed, i.e. everyone wants to be consulted on everything but they also want their own responsibility/accountability to be to be diluted. Bureaucracy 101..

Devs only see ops taking too long to deliver, but ops is generally frozen waiting on infosec, management approving new costs, data stewards approving new copies across ends, architects who haven't yet considered/approved whatever Outlandish new toys the junior devs have requested, etc etc.

Depends on exactly what you're building but with a competent ops team cloud vs on prem shouldn't change that much. Setting aside the org level externalities mentioned above, developer preference for stuff like certain AWS apis or complex services is the next major issue for declouding. From the ops perspective cloud vs on prem is largely gonna be the same toolkit anyway (helm, terraform, ansible, whatever)

Whilst often true in practice, this doesn't have to be true.

The reality is, a lot of these orgs have likely already discovered devops, pipelines, deployment strategies, observability, and compliance as code.

There's basically little in compliance that can't be automated with patterns and platforms, but in most of these organizations a delivery teams interface with the org is their non-technical delivery manager who folds like a beach chair when they're told no by the random infosec bod who's afraid of automation.

I've cracked this nut a few times though. It requires you be stubborn, talk back, and have the gravitas and understanding to be taken seriously. i.e. yelling that's dumb doesn't work, but asking them for a list of what they'd check, and presenting an automated solution to their group, where they can't just yell no, might.

Yes, of course management is often the problem.

I think it helps when people actually take a step back and understand where the money that pays their salary comes from. Often times people are so ensconced in their tech bureaucracy they think they are the tail that wags the dog. Sometimes the people that are the most hops from the money are the least aware of this dynamic. Bureaucracies create an internal logic of their own.

If I am writing some internal software for a firm that makes money selling widgets, and I decide that what we really need is a 3 year rewrite of my app for reasons, am probably not helping in the sale or the production of widgets. If another team is provisioning hardware for me to write the software on, and it now takes 2 weeks to provision virtual hardware that could take seconds, then they are also not helping in the sale or the production of widgets.

These are the kind of orgs that someone may one day walk into, blast 30% of the staff, and find no impact on widget production, and obvious 30% savings on widget costs...

> If another team is provisioning hardware for me to write the software on, and it now takes 2 weeks to provision virtual hardware that could take seconds, then they are also not helping in the sale or the production of widgets.

Well in this example, the ops team slowing down pointless dev work by not delivering the platform that work is going to happen on quickly are effectively engaged in costs savings for the org. The org is not paying for the platform, which helps them because the project might be canceled anyway, and plus the slow movement of the org may give them time to organize and declare their real priorities. Also due to the slow down, the dev and the ops team are potentially more available to fix bugs or whatnot in actual widget-production. It's easy to think that "big ships take a while to turn" is some kind of major bug or at least an inefficiency, but there are also reasons orgs evolve in that direction and times when it's adaptive.

> Often times people are so ensconced in their tech bureaucracy they think they are the tail that wags the dog.

Part of my point is that, in general, departments develop internal momentum and resist all interface/integration with other departments until or unless that situation is forced. Structurally, at a lot of orgs of a certain size, that integration point is ops/devops/cloud/platform teams (whatever you call them). Most people probably can't imagine being held responsible for lateness on work that they are also powerless to approve, but for these kind of teams the situation is almost routine. In that sense, simply because they are an integration point, it's almost their job to absorb blame for/from all other departments. If you're lucky management that has a clue can see this happening, introduce better processes and clarify responsibilities.

Summarizing all that complexity and trying to reduce it to some specific technical decision like cloud vs on-prem is usually missing the point. Slow infra delivery could be technical incompetence or technology choices, but in my experience it's much more likely a problem with governance / general org maturity, so the right fix needs to come from leadership with some strong stable vision of how interdepartmental cooperation & collaboration is supposed to happen.

I've never seen an IT team that couldn't spin up a VM in minutes. I have seen a bunch of teams that weren't allowed to because of ludicrous "change control" practices. Fire the managers that create this state of affairs, not the devops folks, regardless of whether you "go cloud" or not.
I've met multiple customers where time to get a VM was in the weeks to months. (To be fair, I'm at a vendor that proposed IaC tooling and general workflows and practices to move away from old school ClickOps ticket-based provisioning, so of course we'd get those types of orgs).

And more often than not, it had nothing to do with managers, but with individual contributors resisting change because they were set in their ways and were potentially afraid for their jobs. Same applies for firewall changes btw.

I think a lot of HN crowd hangs out at FAANG/FAANG adjacent or at least young/lean shops, and has no idea how insane it is out there.

I was at a shop that provisions AWS resources via written email requests & clickops, treated fairly similar to a datacenter procurement. Teams don't have access to the AWS console, cannot spin up/down, stop, delete, etc resources.

A year later I found out that all the stuff they provisioned wasn't set up as reserved instances. We weren't even asked. So we paid hourly rates for stuff running 24/7/365.

This was apparently the norm in the org. You have to know reserved instances exist, and ask for them.. you may eventually be granted the discount later. I only realized what they had done when they quoted me rates and I was cross checking ec2instances.info I can guarantee you less than 20% of my org (its not a tech shop) is aware this difference exists, let alone that ec2instances.info exists for cross reference.

No big deal, just paying 2x for no reason on already overpriced resources!

I went from that type of world (cell carrier) to a FAANG type company and it was shocking. The baseline trust that engineers were given by default was refreshing and actually a bit scary.

I’m not sure my former coworkers would have done well in an environment with so few constraints. Many of them had grown accustomed to (and been rewarded and praised for) only taking actions that would survive bureaucratic process, or fly underneath it.

The problem is the strong players are less likely to stick around, so you often do end up with folks who can't do the work in minutes - though, the work is usually slightly more than clicking the "give me the vm" button.
Teams are what they DO, not what they CAN DO.
Ok, but I’m not sure what that has to do with what I posted.
> If your on-prem team can't spin up a VM same day, then firing them is probably higher ROI than "going to cloud".

I haven’t seen this be due to one set of incompetents since the turn of the century. What I have seen is this caused by politics, change management politics, and shortsighted budgetary practices (better to spend thousands of dollars per day on developers going idle or building bizarre things than spend tens on infrastructure!).

In such cases, the only times where firing someone would help would be if they were the C-level people who created and sustained that inefficient system.

They probably should be fired, but it's actually complicated because the orgs tend to be staffed with departments that believe this is the way things should be done, and best case the replacement needs to compromise with them, worse case they are like minded and you just get more of the same.
It also allows management to hide bad decisions and poor planning.

Project is a dud? just nuke the cloud project and no more charges for it.

Project is poorly architected and running like a dog? throw more resources at it.

Both of the above are harder to hide when you have to order equipment for on prem.

If you're running an internal cloud, you can likely absorb that.

I think comes down to a couple of things:

- Small orgs don't have the resources to run internal clouds, nor should they be doing so. This limits the pipeline of available candidates. - Large orgs promote the wrong people to management, and they make decisions based on their mental model of the world that was developed 20 years ago. They're filled with people who don't understand the difference between cloud and virtualization. - Large consultancies make more money by throwing raw numbers at the problem rather than smart automation. i.e. it's easier for IBM to bill T&M and a whole project wrapper to patch the server than automate it. - Finance & HR teams want you to bend to their ways of working rather than the opposite.

Of the rest, you get into many of them are simply in ops because they're less skilled software developers, or they're now being asked to assure security, and that scares them so they try to lock everything down.

> Project is a dud? just nuke the cloud project and no more charges for it.

How is that a negative? Not every project is going to be successful. That's just a basic fact of life. That you don't have to deal with the sunk cost fallacy and just pull the plug is a good thing.

> Project is poorly architected and running like a dog? throw more resources at it.

Another positive...?! You can continue to serve your clients and maintain a revenue stream while you work on a better architecture. Instead of failing completely. And once you need less ressources, you easily scale down.

If only that were the case.

Even when everything is in IaC and 100% cloud-native, I’ve still seen dev teams bypass the approved methods because ClickOps is easier.

> waiting for months to get a VM or a database or a firewall rule because the infrastructure/DBA teams are stuck..

You still have to go through your devops (or equivalent) team to make any network configuration/permission changes. Whether that change is implemented by a local firewall rule or some AWS configuration change is not very important.

It's not like you're going to have developers changing AWS access permissions directly. Maybe in a few employee startup, but in any regulated & audited company, you must have separation of duties and audited change control process.

That time wasting is back in the same form or another. For instance, at my side they enable *all* the Azure Application Gateway, even those rules that Microsoft says not to enable - causing even simple OpenID redirects from Microsoft AzureID (Microsoft login) to the application to get captured in AAG and fail.
The layer will still be there, because those teams are now managing some cloud infrastructure central to the organization.