Hacker News new | ask | show | jobs
by maeln 881 days ago
> Seriously, if you’re at the point that you’re doing sophisticated analysis of cloud costs, consider dropping the cloud.

Which would mean that you loose part of the reason to use the cloud in the first place... A lot of org move to cloud based hosting because it enable them to go way further in FinOps / cost control (amongst many other thing).

This can make a lot of sense depending on your infra, if you have some fluctuation in you needs (storage, compute, etc...), cloud based solution can be a great fit.

At the end of the day, it is just a tool. I worked in places where I SSH´d into the prod bare-metal server to update our software, manage the firewall, check the storage, ... and all that manually. And I worked in places where we were using a cloud provider for most of our hosting needs. I also handle the transition from one to the other. All I can say is: It's a tool. "The cloud" is no better or worse than a bare-metal server or a VPS. It really depends on your use-case. You should just do your due diligence and evaluate the reason why one would fit you more than the other, and reevaluate it from time to time depending on the changes in your environment.

This whole "cloud bad" is just childish.

4 comments

> A lot of org move to cloud based hosting because it enable them to go way further in FinOps / cost control

I think a lot of orgs move to cloud simply because it's popular and gartner told them so.

But taking a step away from that, it's really about self-service. When the alternative is logging a ticket for someone to manually misconfigure a VM and then fail to send you the login credentials, then your delivery is slow.

When you're chasing revenue, going slow means you're leaving money on the table. When you're a big bureaucratic org, it means your middle managers can't claim to have delivered a whole bunch of shit. Nobody likes being held up, but that's what infrastructure teams historically do.

> I think a lot of orgs move to cloud simply because it's popular and gartner told them so.

Nah, I think it's mostly about the second part of your comment. Everyone hates waiting for months to get a VM or a database or a firewall rule because the infrastructure/DBA teams are stuck ten years in the past and take pride in their artisanal infrastructure building.

So moving to the cloud eliminates a useless layer of time wasting,

If your on-prem team can't spin up a VM same day, then firing them is probably higher ROI than "going to cloud". Further, a lot of the shops "going to cloud" because their infra team is slow.. then hide cloud behind their infra team.

A prior 200+ dev shop went from automated on-prem VM builds happening within hours from when you raise a ticket, to cloud where there was a slack channel to nag&beg for an EC2 which could take a day to a week. This was not a temporary state of affairs either, it was allowed to run like this for 2 years+.

Oh and, worth mentioning, CTO there LOVED him some Gartner.

Despite years of friendly sounding devops philosophy there's times when devs and ops are fundamentally going to be in conflict. it's sort of a proxy war between devs who understandably dislike red tape and management who loves it, with devops caught in the middle and on the hook for both rapid delivery of infrastructure but also some semblance of governance.

An org with actual governance in place really can't deliver infra rapidly, regardless of whether the underlying stuff is cloud or on prem, because whatever form governance takes in practice it tends to be distributed, i.e. everyone wants to be consulted on everything but they also want their own responsibility/accountability to be to be diluted. Bureaucracy 101..

Devs only see ops taking too long to deliver, but ops is generally frozen waiting on infosec, management approving new costs, data stewards approving new copies across ends, architects who haven't yet considered/approved whatever Outlandish new toys the junior devs have requested, etc etc.

Depends on exactly what you're building but with a competent ops team cloud vs on prem shouldn't change that much. Setting aside the org level externalities mentioned above, developer preference for stuff like certain AWS apis or complex services is the next major issue for declouding. From the ops perspective cloud vs on prem is largely gonna be the same toolkit anyway (helm, terraform, ansible, whatever)

Whilst often true in practice, this doesn't have to be true.

The reality is, a lot of these orgs have likely already discovered devops, pipelines, deployment strategies, observability, and compliance as code.

There's basically little in compliance that can't be automated with patterns and platforms, but in most of these organizations a delivery teams interface with the org is their non-technical delivery manager who folds like a beach chair when they're told no by the random infosec bod who's afraid of automation.

I've cracked this nut a few times though. It requires you be stubborn, talk back, and have the gravitas and understanding to be taken seriously. i.e. yelling that's dumb doesn't work, but asking them for a list of what they'd check, and presenting an automated solution to their group, where they can't just yell no, might.

Yes, of course management is often the problem.

I think it helps when people actually take a step back and understand where the money that pays their salary comes from. Often times people are so ensconced in their tech bureaucracy they think they are the tail that wags the dog. Sometimes the people that are the most hops from the money are the least aware of this dynamic. Bureaucracies create an internal logic of their own.

If I am writing some internal software for a firm that makes money selling widgets, and I decide that what we really need is a 3 year rewrite of my app for reasons, am probably not helping in the sale or the production of widgets. If another team is provisioning hardware for me to write the software on, and it now takes 2 weeks to provision virtual hardware that could take seconds, then they are also not helping in the sale or the production of widgets.

These are the kind of orgs that someone may one day walk into, blast 30% of the staff, and find no impact on widget production, and obvious 30% savings on widget costs...

> If another team is provisioning hardware for me to write the software on, and it now takes 2 weeks to provision virtual hardware that could take seconds, then they are also not helping in the sale or the production of widgets.

Well in this example, the ops team slowing down pointless dev work by not delivering the platform that work is going to happen on quickly are effectively engaged in costs savings for the org. The org is not paying for the platform, which helps them because the project might be canceled anyway, and plus the slow movement of the org may give them time to organize and declare their real priorities. Also due to the slow down, the dev and the ops team are potentially more available to fix bugs or whatnot in actual widget-production. It's easy to think that "big ships take a while to turn" is some kind of major bug or at least an inefficiency, but there are also reasons orgs evolve in that direction and times when it's adaptive.

> Often times people are so ensconced in their tech bureaucracy they think they are the tail that wags the dog.

Part of my point is that, in general, departments develop internal momentum and resist all interface/integration with other departments until or unless that situation is forced. Structurally, at a lot of orgs of a certain size, that integration point is ops/devops/cloud/platform teams (whatever you call them). Most people probably can't imagine being held responsible for lateness on work that they are also powerless to approve, but for these kind of teams the situation is almost routine. In that sense, simply because they are an integration point, it's almost their job to absorb blame for/from all other departments. If you're lucky management that has a clue can see this happening, introduce better processes and clarify responsibilities.

Summarizing all that complexity and trying to reduce it to some specific technical decision like cloud vs on-prem is usually missing the point. Slow infra delivery could be technical incompetence or technology choices, but in my experience it's much more likely a problem with governance / general org maturity, so the right fix needs to come from leadership with some strong stable vision of how interdepartmental cooperation & collaboration is supposed to happen.

I've never seen an IT team that couldn't spin up a VM in minutes. I have seen a bunch of teams that weren't allowed to because of ludicrous "change control" practices. Fire the managers that create this state of affairs, not the devops folks, regardless of whether you "go cloud" or not.
I've met multiple customers where time to get a VM was in the weeks to months. (To be fair, I'm at a vendor that proposed IaC tooling and general workflows and practices to move away from old school ClickOps ticket-based provisioning, so of course we'd get those types of orgs).

And more often than not, it had nothing to do with managers, but with individual contributors resisting change because they were set in their ways and were potentially afraid for their jobs. Same applies for firewall changes btw.

I think a lot of HN crowd hangs out at FAANG/FAANG adjacent or at least young/lean shops, and has no idea how insane it is out there.

I was at a shop that provisions AWS resources via written email requests & clickops, treated fairly similar to a datacenter procurement. Teams don't have access to the AWS console, cannot spin up/down, stop, delete, etc resources.

A year later I found out that all the stuff they provisioned wasn't set up as reserved instances. We weren't even asked. So we paid hourly rates for stuff running 24/7/365.

This was apparently the norm in the org. You have to know reserved instances exist, and ask for them.. you may eventually be granted the discount later. I only realized what they had done when they quoted me rates and I was cross checking ec2instances.info I can guarantee you less than 20% of my org (its not a tech shop) is aware this difference exists, let alone that ec2instances.info exists for cross reference.

No big deal, just paying 2x for no reason on already overpriced resources!

The problem is the strong players are less likely to stick around, so you often do end up with folks who can't do the work in minutes - though, the work is usually slightly more than clicking the "give me the vm" button.
Teams are what they DO, not what they CAN DO.
Ok, but I’m not sure what that has to do with what I posted.
> If your on-prem team can't spin up a VM same day, then firing them is probably higher ROI than "going to cloud".

I haven’t seen this be due to one set of incompetents since the turn of the century. What I have seen is this caused by politics, change management politics, and shortsighted budgetary practices (better to spend thousands of dollars per day on developers going idle or building bizarre things than spend tens on infrastructure!).

In such cases, the only times where firing someone would help would be if they were the C-level people who created and sustained that inefficient system.

They probably should be fired, but it's actually complicated because the orgs tend to be staffed with departments that believe this is the way things should be done, and best case the replacement needs to compromise with them, worse case they are like minded and you just get more of the same.
It also allows management to hide bad decisions and poor planning.

Project is a dud? just nuke the cloud project and no more charges for it.

Project is poorly architected and running like a dog? throw more resources at it.

Both of the above are harder to hide when you have to order equipment for on prem.

If you're running an internal cloud, you can likely absorb that.

I think comes down to a couple of things:

- Small orgs don't have the resources to run internal clouds, nor should they be doing so. This limits the pipeline of available candidates. - Large orgs promote the wrong people to management, and they make decisions based on their mental model of the world that was developed 20 years ago. They're filled with people who don't understand the difference between cloud and virtualization. - Large consultancies make more money by throwing raw numbers at the problem rather than smart automation. i.e. it's easier for IBM to bill T&M and a whole project wrapper to patch the server than automate it. - Finance & HR teams want you to bend to their ways of working rather than the opposite.

Of the rest, you get into many of them are simply in ops because they're less skilled software developers, or they're now being asked to assure security, and that scares them so they try to lock everything down.

> Project is a dud? just nuke the cloud project and no more charges for it.

How is that a negative? Not every project is going to be successful. That's just a basic fact of life. That you don't have to deal with the sunk cost fallacy and just pull the plug is a good thing.

> Project is poorly architected and running like a dog? throw more resources at it.

Another positive...?! You can continue to serve your clients and maintain a revenue stream while you work on a better architecture. Instead of failing completely. And once you need less ressources, you easily scale down.

If only that were the case.

Even when everything is in IaC and 100% cloud-native, I’ve still seen dev teams bypass the approved methods because ClickOps is easier.

> waiting for months to get a VM or a database or a firewall rule because the infrastructure/DBA teams are stuck..

You still have to go through your devops (or equivalent) team to make any network configuration/permission changes. Whether that change is implemented by a local firewall rule or some AWS configuration change is not very important.

It's not like you're going to have developers changing AWS access permissions directly. Maybe in a few employee startup, but in any regulated & audited company, you must have separation of duties and audited change control process.

That time wasting is back in the same form or another. For instance, at my side they enable *all* the Azure Application Gateway, even those rules that Microsoft says not to enable - causing even simple OpenID redirects from Microsoft AzureID (Microsoft login) to the application to get captured in AAG and fail.
The layer will still be there, because those teams are now managing some cloud infrastructure central to the organization.
> I think a lot of orgs move to cloud simply because it's popular

This can be rational and not just following the leader. In particular.. many devs might think that working with an org that does On-prem is bad for their career, and they might be right. So from an org POV you can't hire good engineers if you're perceived as a dinosaur. This actually might be enough to send you towards the cloud even if the price by itself makes no sense

I've experienced the opposite too: orgs looking at down on cloud-only devs.

The idea being that devs who lean on cloud excessively do that to masks their lack of fundamentals, which will cause costly fuck ups no matter what technology they use, cloud or on-prem.

Maybe directionally similar idea to hiring ex-Googlers? Some orgs also don't like those. Specific mindset, specific toolbox.

It is absolutely true that some devs have the AWS product set as the tech toolkit they know best.

Whatever their fundamental skills are, the most important way they add value is by optimizing things like lambda startup time or EC2 CPU utilization. Does this allow them to mask deep problems with fundamentals? I guess it could, but that sounds a bit gatekeep-y to me.

Sort of, but IDK, If you have specific needs this might be a somewhat reasonable heuristic for hiring.

Devs who came up building software more or less from scratch really do have a different skillset than ones who stick to working in service-rich environments because there's a significant difference between glueing services together vs building out those same services. For example something like using a paginated API is quite a bit easier than designing/implementing one. A developer who is skilled and methodical about reading and understanding service-level documentation may not actually be able to step through debugging in a REPL, and vice versa. (Not to say that either kind of person cannot learn the other persons tricks, but as far as the differences in what they already know, those can be pretty significant.)

Assuming someone only has one of these skillsets, the most valuable one totally depends on the situation. On the one hand it's pretty cool that service-familiarity tends to be language-agnostic, but it's less cool when your S3-API expert barely understands the basics of tooling in the new language.

Paginated api is a great example. For me, I learned C from K&R and producing a.out files that would default and leave a core file in $HOME. If I wanted a list structure, I had to build it out of resizable arrays of pointers, etc.

I ended up years later at AWS, and while I was there I built internet-facing paginated APIs over resources which had a variety of backing stores, each of which was had some behavior I had reason about.

So I don’t doubt the difference between API builder and API user, I’ve been both. I think it’s less about what you are doing and more about how you do it (with curiosity about how things work, vs. as an incurious gluer).

That said, looking at the code inside MySQL is highly instructive for the curious; AWS doesn’t provide that warts-and-all visibility into their implementations, which cuts off the learning journey through the stack.

There's also now regulatory capture. Your bare-metal or VPS solution won't be "FEDRAMP" approved, even though there are fewer moving parts to secure.
I have worked in places where adding a new server is a bureocratic nightmare at best.

Granted I dont think thats the norm, but also host your webserver yourself is not as user friendly as AWS.

People always forget that.

>> This whole "cloud bad" is just childish.

Not childish…. it’s a growing line of thought in the IT community that has bought the cloud sell unquestioningly for 20 years.

But it's the same mindset as "cloud good", which was also a growing line of thought once. Mantras aren't useful; tradeoff analysis is useful.
The GP said "[if X happens] consider dropping the cloud". Which is totally different from a mantra.

There is virtually nobody saying "cloud bad" without nuance.

Mantras are good for orgs that are not mature enough to do actual analysis. A lead developer left recently where I work and while it was likely higher pay that was the biggest decision to move, I suspect the real reason he left is higher ups simply don't listen when he says things like you can't just take full virtual machines on azure, refuse any rewrite/redesign while complaining about high azure spend.
AWS is infamous in financial services for this though.

First they give you a ton of credits, assign you internal resources to help.

Then they encourage you to simply "lift and shift" your workloads onto EC2/EBS/EFS/etc. It's 100% compatible with your current system, you can rollback, etc. This take two years, then you notice your AWS bill is 10x your old infra.

Then they say - of course, that's because you need to rewrite it all to serverless/microservices/etc that are all AWS bespoke branded alphabet soup of services. Now you are fully entrapped, and can not rollback to your own infra, let alone another cloud provider without another rewrite.

A lot of big financial firms are 5+ years into this. Several have rolled back for certain use cases due to cost, especially anything with a lot of data transfer because yeah.. performant storage in the cloud & egress are expensive, duh.

You can still use standard stuff like Kubernetes, even if you go microservices. I don't think it's that bad.

I'd say Cloud lets you do a few things, but the way I think of it ultimately is it lets you spend opex instead of capex. If that means though that your opex will end up higher than your capex, then it would be silly to go with it.

The other thing is in theory your reliability should be higher, but, again, that will depend on your individual situation, and how much reliability matters to you.

You CAN, but of course that's not what AWS steers you to.

Once your org has gotten to that step, it's been so steered by AWS staff it's hard to imagine suddenly finding sense and building with open standard stuff. Very few AWS shops I have encountered avoid the siren call of various AWS-only or AWS-specific services, which they then become heavily ingrained in..

Generally I do think its mostly about transforming CapEx to OpEx, with the rest of the stuff being noise.

Well, then they can flip a coin for which mantra to follow. If you pick "cloud bad" you'll get stories also about companies that refused to go to cloud when it makes sense to.
Oh how I wish it had been so. The cloud has been a hard sell all along. Also, 20 years ago S3 and EC2 didn’t exist, so maybe it’s been a little less time than that.
The cloud is better and worse than bare metal. It depends on the use case.

AWS is Kafkaesque though

I have had the same experience. And all this, even though Amazon has granted us really generous and free annual plans and professional advice (all inclusive for a non-profit GmbH).
There is a massive difference between "[if X happens] consider dropping the cloud" and "cloud bad".