Hacker News new | ask | show | jobs
by blissofbeing 2372 days ago
Easier said than done. To implement that I will probably need to hire a DevOps guy, and now we have all this cloud formation (or whatever your infra as code choice is) code to manage now. So in reality it probably costs more (devops and more code to manage) than if I just went with a couple cheaper bare metal servers.

If you are running in the cloud then you still need a devops guy same as you would if you where bare metal. In fact you will probably need _more_ devops people the deeper you get into the AWS ecosystem.

10 comments

Datapoint: We have 2 "DevOps guys" supporting a significant AWS infrastructure. We autoscale from 200 ec2 instances at night to 700 ec2 instances during the day. We run 60+ microservices, each of which has multiple processes that run, each of which is autoscaled (we use ECS). We use Aurora (with autoscaled readers) and DynamoDB (autoscaled IOPS). We manage all of that with 2 "Devops Guys".

Granted, we're a mature startup and have put a few years of investment (at the cost of 2-3 "Devops Guys") into our infra, but ultimately it doesn't take much to manage a ton of AWS infra once the tooling is in place.

Man, that sounds so luxurious. I'm begging for us to hire a second guy because I'd like to not always be on point for everything and to take vacations. Probably running an order of magnitude more stuff than you described, multi-cloud and with Terraform.

Terraform and the fact that I came in with experience makes this doable. But only just.

just for back of the envelope, who many customers are you able to support per ec2 instance?
WhatsApp used to be hosted on ~15 bare metal servers serving 100 million concurrent users...
They were also acquired at a price which would value each employee at ~350M.

They were capable of scaling in a way that is certainly an anomaly, and not indicative of the costs of an ordinary team.

It speaks volumes about what the right talent and architecture/technology choices can do if leveraged successfully, but is more of an interesting anecdote than a realistic infrastructure budget.

> They were also acquired at a price which would value each employee at ~350M.

That’s a pointless calculation. The acquisition wasn’t for the employees. As with all network-effects products, the acquisition was for the active user base. They could have acquired WhatsApp, fired the engineering team, rewrote it with an architecture that required 100x the servers and still been happy.

It speaks volumes about Erlang/BEAM I think
I find this so fascinating, is there more info on the software/hardware during this time period?
Before ist was possible to Share Images. BTW they used s3 dir that
We have give or take 30M monthly active users.

The instances we use are not the largest (we use 2xls) but we also incorporate spot instances as part of our autoscaling.

What do you need if you're managing a significant amount of metal? No devops guys?

Everything is a tradeoff.

You can manage things efficiently in AWS if you do it right. You can manage metal efficiently if you do it right.

You can kill your business if you do either one wrong.

One of the benefits of the cloud is that the developers should be able to easily manage their own infrastructure. After all, they should be the people most familiar with the performance profile of their service/micro service/application. They should be the ones making decisions like using Aurora vs Dynamo vs managing your own dB on vms or bare metal vs a Hadoop cluster across VMs. They should own their deployment pipeline with CI/CD. If you have a dedicated DevOps person or team on a pure cloud application you are either a very large organization that is coordinating across multiple development teams that each have their own infra. Or you have built something brittle and not entirely cloud native (eg self managed Cassandra or elastisearch on a cluster of VMs). (Third possibility is a complex micro service architecture where it’s nice to have someone purely in-charge of “the system view” of the infrastructure even with a small number of developers.)
Sounds like a good way to get a runaway bill.
Why? Do You think developers can’t consider cost/performance? Do you think engineering managers and their finance partners don’t care? Maybe I’ve gotten lucky with company choice, but the engineer who finds massive cost savings due to optimization gets recognized over someone implementing the next basic feature.
Are you saying that the developers are unable to look at a bill?
I disagree. Any of our senior developers can create a CloudFormation template or use one that we already have and make minor changes and include in their repository.

Every place that I have worked it’s the responsibility of the team who wrote the code to create the CI/CD pipelines.

Wait, are you really practicing an environment where your developers have an understanding of operations.

Dang, I wish we had a term for this blending of roles...

We have grown in complexity since we first started to have a dedicated “ops team”, but honestly it’s because developers just didn’t want to do the grunt work and we needed someone to make sure everything was done consistently.

But still ops serve developers not the other way around. The senior developers who knew AWS well, basically set the standards and kept ourselves accountable to the ops guy we hired, even though any of us can override him because of our influence in the company.

I started taking away some of my own access and privileges just so I would be the first to hit roadblocks to feel other developers pains who weren’t given the keys to kingdom.

But you understand Ops, and you have your developers understand Ops, which is my point.

Hiring "DevOps" teams completely misses the point, in the same way that I don't hire Unit Testing teams to write the unit tests that my Devs don't want to do the grunt work for.

When a Dev understands Ops they write more efficient code, as they realise what storing your entire DB in cache really means for the server.

This is my experience too, albeit sometimes it does feel good to have an infra specialist on the team. Edge cases do happen.
That’s what the AWS Business Support is for.

But at least three of the senior engineers (including me) I feel could hold our own against any “specialists”. My experience are too many of the “specialists” are old school netops people who got one certification and treat AWS like an overpriced colo.

AWS likes to pawn their customers off to Certified Partners for outsourced solutions.

I'm a software engineer who went the specialist route because it does take real skill to do this well. Yes, I am embedded on a team of old school netops people now, but I'm in charge of all of this and I get to drag them kicking and screaming in to the modern world.

Specialists are worth it if you find the right one.

I’m a software developer/architect/team lead/single responsible individual depending on how the wind blows, but after a few years of adding AWS to my toolbelt, I think I can hold my own and I have been recruited to be on the infrastructure side.

Old school netops folks are so afraid of becoming less relevant they do their best to keep control. But at least they are harmless compared to the ones that have tried to transition to the cloud. They are actively harmful costing clients and companies more with little to show for it.

And no I am not young. I’m 45 and started programming in assembly in the 80s.

“lift and shift” should be phase 1. Not the end goal.

I think we pretty much see eye to eye here :)
>> and now we have all this cloud formation ... code to manage

The cloud formation code is (likely to be) much less than your application code. ... and if your intention and need is to have couple servers, then you don't really need any infrastructure code. If these cases, yes bare metal is much cheaper and (probably) better option.

> Easier said than done. To implement that I will probably need to hire a DevOps guy, and now we have all this cloud formation (or whatever your infra as code choice is) code to manage now. So in reality it probably costs more (devops and more code to manage) than if I just went with a couple cheaper bare metal servers.

This does not make any sense. You don't need cloudformation or anything, you can just use a wizard and provision as many VMs (baremetal or otherwise) as you need. It's literally a form and next -> next -> next.

Now that you have systems you can login to, their complexity is the same – except you won't have to care, or manage any hardware.

You still have to manage those systems yourself. Keep them patched, secure, and the workloads up. It's your choice whether or not to delve deeper into the AWS ecosystem.

Note that even though I said you don't need cloudformation (actually just use Terraform instead), you have a lot of power at your disposal if you do. You can't automate racking and stacking of physical servers, but you can fully automate the lifecycle of a cluster of VMs. At my job I can bring up a 40+ cluster containing many kinds of workloads, with a single butotn press. And destroy it just as easily (for non-prod). That's invaluable.

> Now that you have systems you can login to, their complexity is the same – except you won't have to care, or manage any hardware.

Depends upon the number of servers you have.

Back when I worked on several somewhat popular websites ( a handful with ~1-5mil daily unique users), we had about 40 servers and they mostly took care of themselves. Between me (primarily a developer) and the CTO we averaged maybe a single day thinking about hardware per month, and that was mostly to install new hardware rather than taking care of existing stuff.

If you have this number of servers, once you have something like Ansible setup (we used cfengine back in the day, ugh), both hardware and software mostly manages itself.

What are you comparing? If you’re comparing a large, HA SaaS use case with a static website on a VPS, then of course the latter requires less DevOps work, but if you hold the use cases equivalent, AWS requires much less DevOps work than bare metal. Notably, just because you’re using bare metal doesn’t mean the motivation for infra-as-code goes away; to the contrary, you need more of it because you’re now managing more services that come out-of-the box on AWS.
Your static site would still probably be better on AWS. Whack it into an S3 bucket, put cloudfront in front of it, and you have a globally scalable, CDN-enabled static site with at least five 9's of reliability and it costs you peanuts.
in 2007/2008, I managed a GIS server / website that averaged 100 hits per day, but once every couple years the website would get listed in time/cnn and would get millions of hits.

I set up an EC2 instance behind a load balancer and set it to auto-scale. done. If I had to handle that bare bones, I would have had to upgrades switches, manage dozens of servers, deal with hard drive failures, etc., and most of that would be idle 99% of the time.

As someone who is often categorized as a devops guy, there really shouldn’t be such a role. The whole point of devops is that you have developers that can perform operations tasks.
Devops means different things to different people. I'm a devops guy surrounded by devops guys and we all are operations people that develop, which day to day is completely different from a developer that performs operations.
In that case you can use Heroku or a 'Managed Cloud Experts' like Rackspace