Hacker News new | ask | show | jobs
by squallstar 1604 days ago
Yes, tell me about not needing the cloud (aka a managed provisioning and scaling service) when your poorly configured database breaks, or when you need a 3 hours downtime on prod because you need to reboot and reconfigure your services, or your release breaks production because you're using a diff tool to run deployments, or you simply have no option to scale horizontally past your single vps once traffic comes in.

Very clickbait article, please don't blindly follow recommendations by someone which obviously doesn't get services even like Elastic Beanstalk, Lambda and Identity management (:shrugh:).

Sure, a VPS is fine for a low risk pet project like your portfolio, a blog, some marketing websites, the project I built over the weekend and a few other things. For anything else, there's literally not a single reason for not wanting to use a cloud service / a managed provider.

17 comments

> Yes, tell me about not needing the cloud (aka a managed provisioning and scaling service) when your poorly configured database breaks

I mean cloud is no panacea here either. We had a multi day outage of our SQL Data Warehouse in Azure when something broke on their end, and we were stuck sitting powerless waiting for them to fix it. Fourtaunetly for us it was used for offline processing, so the outage just meant we were late delivering fresh data, not fully down.

For those wondering, yes we had backups, yes they were tested so we knew they'd "only" take about 6 hours to restore, but we also had support telling us it was their highest priority and would be back up "soon".

I'm not even saying we could necessarily do better, but I certainly understand why someone might prefer to trust themselves to resolve a situation like this instead of having to rely on a 3rd party that frankly isn't feeling your pain.

>cloud is no panacea

That is the subtitle of this article I wrote last month - https://medium.com/@rykrk/everything-is-just-build-vs-buy-d7...

I note that the grandparent's premise for a catastrophe is that everything in a datacenter must necessarily be poorly configured. To that, I ask: how many corporate cloud footprints have they looked at?

Don't rely on one article to design your tech stack, rely on one comment to know that the cloud is "literally" perfect for everything larger than a blog.
Nowadays there's such an abundance of PaaS that in my opinion it doesn't make any sense anymore to get a VPS and manage it yourself, unless you're willing to potentially spending numerous hours in DevOps yourself in both initial setup and maintenance, but more importantly you're aware of the implications when you go past the traffic it can physically serve.

Furthermore, managing your own server potentially leaves more room open for misconfigurations (including backups) and definitely won't get you past any information security questionnaire.

I'm a "DevOps" working in a company 100% on AWS and know what? There are hundred. thousand of men-hours per year of "DevOps" work, even in the cloud, even when using managed services. You don't need to manage a VPS maintenance but you do need to spend time refactoring Terraform code, you need to check for the best EKS workers to not shell out a fortune, now you have to migrate things to ARM/Graviton because Finance is telling you that we are throwing money away. Heck, you need to have 1 full-time people to control the bill and the spending!

You might say these problems come more from the scale of the business rather than the bare-metal/cloud dichotomy, but if you are a in startup like the linked article is about, well, you still have to do all the work and it doesn't change too much if you have to know how AWS/Azure/GCP work and bills you or which command-line backup tool you want to use. There will always be more than just pure code.

> you need to check for the best EKS workers to not shell out a fortune, now you have to migrate things to ARM/Graviton because Finance is telling you that we are throwing money away

A lot of software projects I've been involved in had some extra complexity (asynchronous processing, etc) because the cloud is expensive and the hardware they ran on was underpowered as a result. This introduces more moving parts and you have less margin of error. These moving parts (as opposed to the underlying hardware) can fail and cause an outage and this may happen more frequently than if you were operating on a single, hardware point of failure, defeating the entire purpose.

In my own project running mostly on bare-metal it is much simpler because a lot of worry about cost or performance goes away. Yes I could put this in a queue and maybe it's the right solution down the line, but in the meantime I have so much CPU that I can afford to do the task on the main thread and not worry about any of this. I also have much more margin for error in terms of resources (CPU/RAM/disk) so that if a process does go haywire it'll take much more time for it to cause an issue, buying you time to notice and fix the problem before it takes the whole system down.

There is this whole level of complexity in enterprises where enabling groups package or wrap cloud services that should be used by their devops teams, such that the result is compliant to the company's standards for security and monitoring etc.

Even though the product of the vendor doesn't change, there is a constant maintenance load on the devops teams to keep the internal packages/pipelines up to date with whatever the enablement teams cook up every time.

When doing new things, most time is lost in navigating the landscape and setup of these pipelines and products, firewall rules, network rules, which live in another documentation universe from the cloud as the world knows it.

No matter what you do, IaaS or SaaS: the enterprise is going to keep you busy.

> Furthermore, managing your own server potentially leaves more room open for misconfigurations (including backups) and definitely won't get you past any information security questionnaire.

If you believe that cloud magically removes this as a hurdle then you surely should not be dumping your infrastructure into a cloud provider. Misconfigurations in cloud are real, happen often and are often times much harder to validate without 3rd party tools or spending a lot of time building tooling to extrapolate if your footprint is doing what you think it is.

Misconfigurations in the cloud are much more dangerous as well as you can quickly accumulate charges and there is no way to implement a spending cap. With bare-metal hardware the costs are typically fixed and agreed in advance.
Is downtime an actual, business-killing problem in practice? In my experience, very rarely so, and clouds also have downtime (over which you have much less control) and seems like we live with it just fine.

> when you need a 3 hours downtime on prod because you need to reboot and reconfigure your services

What about when your RDS instance fails and is then stuck on "modifying" for an indefinite period of time (ended up being 12 hours, and I suspect an AWS engineer eventually did a manual operation to fix it) and you have to restore from a backup and rebuild the missing data manually from other sources such as logs in the meantime just to get back online? I've seen it happen and would've much preferred having the option to SSH in and recover it manually.

Scaling is less of a problem when bare-metal is so cheap that you can significantly overprovision and never have to worry about autoscaling. This also means you need much less moving parts that can break and take your service down.

I'm not saying that the cloud is always bad, but a hybrid approach would be the most pragmatic choice. For raw compute and bandwidth, bare-metal is orders of magnitude cheaper. You can still use the cloud's managed services from those if you need them, though given how cheap bare-metal is you may realize that you no longer need a lot of them.

When it comes to management/sysadmin work, every shop that uses the cloud beyond very small projects that are fully on a PaaS such as Heroku has a dedicated DevOps person (or more), no different from bare-metal in terms of effort. I'd argue it's more effort than bare-metal because clouds and their associated services, APIs and tooling (Terraform, etc) change much more frequently than old-school Linux and hardware.

> Is downtime an actual, business-killing problem in practice? In my experience, very rarely so

Of course it is. If you make a service that other people use as part of their workflows or business, they will be switching providers if you’re the only one that routinely goes down.

This is also a slippery slope. If your engineering team is in the habit of shrugging off downtime as no big deal, it tends to get worse and worse as time goes on, staff turns over, systems scale up, and load increases. If you can’t manage to keep downtime to a minimum when you’re small, it’s going to be much worse when you’re bigger.

> Is downtime an actual, business-killing problem in practice?

Yes, if the business is an enterprise, unscheduled downtime can result in significant losses. It's why disaster recovery is a serious business.

Of course, there are businesses where uptime is absolutely critical, though I'd argue a lot of those already operate their own hardware for that reason (and would benefit little from moving to the cloud) or already have a cloud-based, distributed (multi AZs or multi-cloud even) system in place.

But is this actually the case of most companies? The AWS outages always have major ripple effects across the internet, suggesting that a lot of companies don't actually do what is needed to guarantee uptime and manage to survive and succeed despite that.

I worked at a company that had Target and other Fortune 500 retailers as clients, and we had very strict SLAs with financial penalties if we broke them. There was absolutely the possibility that we could have ended up our clients more than they paid us.
I don’t know. We used a managed kubernetes cluster at my previous work and it was a shit show. The support was worse than useless and we couldn’t do much about our issues.

I now manage kubernetes myself and it’s much better. If there is a problem I can fix it myself and not have to deal with oversea support who barely understand the issue half of the time.

Same here at previous co. The so called “platinum support” on certain providers could be better categorized using a different compound
> there's literally not a single reason for not wanting to use a cloud service

How about that the major ones, from Amazon, Microsoft, and Google, all have a track record of horrible corporate ethics?

Or the fact that AWS itself seems to have a major (multiple hour) system-wide outage at least once a year?
System-wide? Not really since AWS has hundreds of services. Some services or some regions fail sometimes, but in 99% of cases good architecture can avoid that.
Sure, I'll warm my home and feed my family on good vibes, then.

I'm trying to build something awesome, not work for Peace Corps, I already accept a certain amount of third or fourth degree Evil, just by using Apple products, or even using this website (can you be certain not a single byte of this data touches material that was made under duressing work conditions?).

It's the whole The Good Place problem, all over again; it's impossible to live a moral life in a globalized society, if you think any use whatsoever of anything that could have in some way contributed to an amount of unhappiness is a Thing Worth Avoiding.

Well to clarify, I'm not necessarily saying that the ethics reason clearly outweighs all other tradeoffs. I was responding to the very specific assertion that "no reason exists". My greater point would be that the comment I'm responding to was being a little hyperbolic. There are definitely reasons not to use cloud. Your ultimate decision, especially in the context of project-specific requirements, is a different, bigger discussion.
Why do you need to use a major service though? There's a ton of cloud providers you can use that offer pretty much a similar range of services and are certified, e.g. ISO27001, SoC2.
Major services have major pay outs when majorly bad things happen to their enterprise customers. Finance will not approve a level of risk that would bankrupt your cloud provider for any critical service. I suppose this is mostly immaterial for startups given they don't have the weight or value.
I had a managed database go completely haywire. If I'd been able to SSH in or connect as an admin, the problem would have been found and resolved quickly (indices had outgrown the RAM by a significant factor). As it was, I ended up with around 4 hours of down time while waiting for support to do their thing. Now, this was all with a crappy provider (RackSpace) for a legacy app, but still, outsourcing competence is no panacea.
Elastic beanstalk is nice, but it’s autoscaling is kinda crap sometimes. It’s temperamental for shared load balancers. Logging of docker stacks sometimes works, sometimes auto detect fails and cloudwatch doesn’t get anything any more.

RDS is great, but sometimes there aren’t t3 instances for a month in your region. You can never make a disk smaller.

Bare metal instances are nice, especially when they cost 1/10 an equivalent vm. 10x on bare metal gets you a looong way. Biggest issues I have with hetzner are the lack of 10g enet on most instance types and the funky open vswitch overlay network.

I downvoted your comment because it doesn't engage with the author's argument. Instead you simply write dismissively "a VPS is fine for a low risk pet project like your portfolio, a blog, some marketing websites, the project I built over the weekend and a few other things. For anything else, there's literally not a single reason for not wanting to use a cloud service / a managed provider."

Everything you wrote in your first paragraph is obviously something that factors into the decision, but nothing more. You're just a random person on the internet, so your appeal to yourself as an authority does not lead to a useful conversation.

The scenario you describe hasn't happened to us. We haven't had more than a couple of minutes of unscheduled downtime in the last 10 years.
That's interesting. I help my clients to migrate away from the cloud and they never had the problems you mention. Their financial department appreciates the stability of expenses, though - and the fact that in some cases they are an order of magnitude lower.
"a VPS is fine for a low risk pet project like your portfolio, a blog, some marketing websites, the project I built over the weekend"

You are making the same mistake the article did. VPS is a great option for lot of production applications and not just low risk or pet projects. Not every production application needs Elastic Beanstalk or Lambda.

The answer is always "It depends but all options are on the table"

I agree, I deliberately generalised with those examples to make a little clear that as the required traffic, capacity and risk increases there's a lot of benefits with using services that automatically configure and provisions load balancers, network routers, machines, clusters, etc for you.

This is not to say that it's a "one click configuration+deploy with no security issues whatsoever" , but depending on the clients you work with you may be rightfully forced into using a cloud provider and hire DevOps engineer to manage the infrastructure.

> there's literally not a single reason for not wanting to use a cloud service / a managed provider.

How did we get to this point of learned helplessness. Some people are in fact capable and do run there own software stacks, configure and manage their own databases, and are self-sufficient in providing their own service to their customers and users.

> because you're using a diff tool to run deployments

I really don't see what using a diff tool has to do with the cloud.

This is basically the textbook definition of a straw man argument.

$0.09/GB is a good reason
> there's literally not a single reason for not wanting to use a cloud service / a managed provider

How about lock-in?

Depending on how you provision the services (e.g. infrastructure as code) you may not be locked in at all and switching to a new provider is relatively straight forward. This comes up quite often in infosec questionnaires for DRP, so if you're hosting mission critical applications you must be read for a backup plan and switch to a new vendor if required.
> (e.g. infrastructure as code)

Terraform configuration for AWS is AWS-specific. It cannot be ported to other stacks. Of course, you could design your terraform code such that you only use modules that abstract away the cloud provider (presumably you'd have to write them yourself), but I'm not sure if anybody does that, and if so, it probably suffers from the same downside as "database-agnostic code", i.e. lots of potential for optimising for a specific hoster and/or for using specific features exclusive to them goes to waste.

To be honest, the tech world is full of dilettantes with too much free time.
For many companies, 3 hours of downtime per month are no problem at all. So they can pocket the cost savings because they don't need the reliability that you pay for with a cloud.