Hacker News new | ask | show | jobs
by TheHydroImpulse 3505 days ago
Until you need to add another server, or 10 or 100... Not to mention it's another set of skills you need to have. It's a tradeoff. (I'm not talking about "we may need 100 servers next year because we'll have all this traction by then" -- I'm talking about "our load is growing at 1.5x every month or next month we need X capacity")

I just finished booting up two new clusters with 5 and 15 nodes, respectively and cycled them a couple of times after making changes to the AMI. The clusters are in ASGs and will scale based on resource usage. I can't do that with bare metal.

3 comments

This argument always comes out when someone points out how much bare metal stomps AWS. Here are some counterpoints

a) A large portion of AWS hosted stuff probably doesnt need that level of sudden, burst scaling

b) With something like SoftLayer/IBM you can scale physical servers, usually within 30 - 60 minutes

c) If your burst scaling requirement are temporary, if you are located in a decent DC, you can probably spin up some infra in AWS and access your physical stuff over a private network connection and get the best of both worlds.

As always, use what's best for your environment.

a) I don't really have a good enough sample size but I'd imagine a lot don't.

The biggest selling point of AWS is everything around it. You don't just get EC2, you get Route53, ELB, VPC, RDS, S3, CloudFront (although it's kinda expensive), ECS, etc... If I can pay AWS to do something instead of building it, I'll do it.

Most startups hope that they'll suddenly need to increase capacity by 100x, but it nearly never happens. Most vendors can provide dedicated servers within a few minutes (if you don't order too many at once), so scaling is still possible in the vast majority of cases.

Even if you always have to scale up for 1-2 hours per day, using dedicated hardware that's idle the rest of the day is probably cheaper in most cases.

Oh, for sure. A lot of startups don't need that ability. I work for a pretty infra heavy startup so AWS is simply required at this point. But we've hit AWS capacity limits during the worst times (one of our clusters processing 20k events/sec hit 100% utilization) and they literally had no capacity left for that instance type. It's not a perfect thing all the time.

But in the end, the pros significantly outweigh the cons. Our resource consumption is naturally extremely elastic. While we'll always need to slightly over-provision to maintain some headroom, adding/removing nodes throughout the variance saves quite a bit of $.

There are other benefits also:

1. You can get started for dirt-cheap or in some cases, free

2. There's a common API for requesting new instances and performing maintenance tasks

3. There are extra services available to help build your apps such as SES, S3, and RDS to name but a few I found very helpful.

I'm not saying anything in this thread is wrong. But in software engineering, we say "write the code that only you can write", which is a suggestion (but not a rule) to use pre-built libraries instead of trying to make your own. Perhaps we should also say, "run the instances that only you can run".

>2. There's a common API for requesting new instances and performing maintenance tasks

Only true if you commit to vendor lock-in. If you use a higher-level cloud agnostic library, then it likely works with openstack as well so you can manage on-prem and off-prem instances the same.

At a high enough scale, you have a lock-in _somewhere_. Spending time trying to abstract yourself from any lock-in can be wasteful.
You can also rent VPS servers that are still cheaper than AWS temporarily and add them to the cluster whilst waiting for dedicated hardware.
Unfortunately, mixing and matching ends up really complicating things especially with security in mind. Many people run within a VPC and bridging to another private network is, well, I don't really want to think about it at this time.
We've found OpenVPN to be our friend here: create an overlay network that doesn't really care if nodes are bare metal or "cloud".
I thought about that too, but as far as I see with OpenVPN you have the single OpenVPN server as single point of failure and all the traffic goes through the server, which quickly becomes a chokepoint. If I needed this again, I'd try out tinc first. It does not appear to have the single point of failure issue.
We have multiple standby servers to prevent the SPOF issue.

One problem we HAVE seen is a reduction in maximum bandwidth. Since we're CPU limited, however, it hasn't really been an issue.

That's the thing - it is much easier nowadays. Kubernetes requires your containers to run on flat shared networking namespace, so your new machine joins that network. It is like running within VPC. Software like Rancher makes the process of adding new server a matter of executing a one liner on server.
"Sure, you could just buy a toyota corolla and get to and from work without much hassle. However, I commute in a lamborghini gallardo in case I need to get from 0-60mph in 2.8 seconds to snag a narrow spot on the expressway from an onramp. I can't do that with a toyota corolla."

I can't wait until the day that we mature as an industry enough to consider running any kind of baseline workload on EC2 negligent.

I'm talking about _my_ use case for using AWS. I'm sure other people have similar requirements. We manage hundreds of servers, process over 50 billion events/month and losing data is unacceptable.

In the HN echo chamber, you might think everyone just has a SPA and just needs a Digital Ocean droplet. Everyone has different requirements and AWS fits those for many people.