Hacker News new | ask | show | jobs
by mbesto 3402 days ago
This is incredibly misguided.

> The cloud as a money saving venture is and always has been a damn lie.

I personally have conducted migration savings assessments for companies going from data centers (owned/lease hardware) to cloud services. I can tell you with 100% certainty that the savings are there and have seen financial proof of the savings.

It's either that, or I'm a liar.

3 comments

Okay, help me out here. Here's my math: I move about 10 TB of bandwidth a month, I currently am in a DC that offers me 30TB on gigabit for $90/mo for my 1U box that hosts my small business and a few development systems - my server cost me a total of $1500 with 72GB of RAM and 4x3TB drives in RAID10, so I have 6TB of disk that's about at 50% capacity.

To go to Amazon EC2 let's see... 64 GB of RAM for $0.862 should do the job there, but just for that I'm at $620/mo. Then let's add bandwidth, 10TB out at $0.01/GB = $100/mo. Now for disk I have to use EBS which runs (let's say st1 will do the job) 0.045/GB/mo*3TB = $135/mo. And that's before any IO.

So to move from bare metal to amazon I go from $1.5k upfront + $90/mo to a total of $855/mo on Amazon.

I can buy a new server every few months for the same cost of hosting that at Amazon - unless I'm severely missing something. Maybe it's worth if it you're a very small installation or one with very unpredictable traffic? But I just don't see it.

3 things I think you should consider in this situation.

1. Reserved instances. If you've got a running business and are expecting to be around for 1-3 years, you get 40-60% discount on that 4xlarge.

2. Do you want to move the service exactly as it is? Maybe you don't need the large ebs? Maybe you can rewrite the storage to S3 instead which is much cheaper? Do you have heavy, sporadic tasks that you can move out of your main system and into lambda+queue and use a smaller instance?

3. How much do you spend on people to monitor the hardware, source and replace the disks, do firmware updates, etc. ? How much on external services to monitor your box which could be replaced with integrated AWS solutions at free tier?

Basically what I'm trying to say is: if you just lift your current system and move it to AWS, you're ignoring lots of opportunities. You need to consider much more than a 1:1 hardware requirements migration.

Why should he re-architect his system and introduce some Amazon-specific dependencies like S3 just so he can give Amazon money? How much development time and money will he spend trying to make his setup work on AWS? How many new bugs will he accidentally introduce in the process? How badly will moving to cloud storage via S3 affect his performance v. having the files on a local disk? When S3 outages like the one that happened two days ago occur, will his customers be understanding of his decision to move into the cloud for funsies?

It's really amazing that, when faced with clear evidence of the expense involved in moving to AWS, you suggest that he prepay for a year of service up-front and redesign his application just so Amazon's bill doesn't look so egregious anymore.

My experience aligns with his. We moved from racks with 20-something boxes to EC2 with 100+ instances. Our monthly bill was 80% of the cost of the hardware in our data center.

How did we solve this problem? Why, move to Docker and Kubernetes of course! Over a year of manpower has been devoted to that task. What kind of savage would ever return to bare metal in this enlightened age of expending millions of manhours redoing stuff that was already working perfectly well?

If you want to autoscale, autoscale on the cloud and keep your primary nodes on bare metal. There's no need to start forking over millions of extra dollars to cloud providers to host all of your infrastructure.

Cloud has some unique benefits, but it should be used for those unique benefits only. There's no reason everything has to be moved there.

I'm trying to compare costs in a better way than "lift app from here and move it there". Sure, you can do it that way, but you're moving an app which was written with your current architecture in mind. It's not surprising it will be more expensive after the move.

I'm not advocating to rewrite just to give Amazon money - you're creating straw man arguments here.

I'm just saying that if we're comparing different ways of hosting, we shouldn't pretend they work the same and have the same tradeoffs. Maybe one server approach is optimal. Maybe the cost would be smaller if the service would be split into various components. That's all part of a proper comparison. Including the cost of getting from one arch to the other.

Specific example:

> How badly will moving to cloud storage via S3 affect his performance v. having the files on a local disk?

Depends what they do with the data and how the users interact with it. Maybe there's no data that could be migrated (all records need to be available in memory for the web app), maybe it would be slower but have a trade-off of using smaller instance, maybe it would improve the performance considerably because file-like blobs are not stored in the database anymore and users can get them quicker via local cloudfront caches. There's no generic answer!

As for the outage... S3 was down for a few hours. And for many people it happened when they were sleeping. If your single server goes down in a data centre - what's your cost and time of recovery? S3 going down once a year still gives many companies better availability than they could ever achieve on their own.

If the need to reassess the architecture doesn't convince you this way, think about migrating from AWS to own hardware. If you were using S3, sqs, lambda and other services, are you going to plan for standing up highly available replacements of them on separate physical hosts? (Omg, we need so much hardware!) Or will you consider if it can be all replaced with just redis and cron if you have relatively little data?

>Why should he re-architect his system and introduce some Amazon-specific dependencies like S3 just so he can give Amazon money?

You could make the same argument going the other way though - why should GitLab spend money recruiting/hiring people, leasing space, setting up monitoring, etc, if their solution today works? Why should anyone re-architect their system to give $COMPANY money? When your bare metal's RAID controller craps out and you have to order another, will his customers be understanding of his decision to move to bare metal?

It doesn't makes sense to factor in fixed costs of such a migration.

>You could make the same argument going the other way though - why should GitLab spend money recruiting/hiring people, leasing space, setting up monitoring, etc, if their solution today works? Why should anyone re-architect their system to give $COMPANY money?

Well, that reason would be, at a minimum, a halving of their hosting expenses.

It also doesn't necessarily take much refactoring to move either way. Even if you're heavily dependent on cloud storage, etc., you can access that from an external application server.

The person I replied to was suggesting that the parent refactor in order to make AWS costs less egregious without articulating any particular reason that the original commenter should do so.

> When your bare metal's RAID controller craps out and you have to order another, will his customers be understanding of his decision to move to bare metal?

His customers won't need to know, assuming he has a standby that can take over. Even if he doesn't, he can rush down and install one and move the disks over. With an AWS outage, you can't do anything but say "I hope Amazon fixes it soon". The bare metal equivalent is a power or connectivity loss at the DC, which is much rarer than AWS outages.

> rewrite the storage to S3 instead which is much cheaper

So, pay for the privilege of being vendor-locked to Amazon. What a wonderful idea.

That's how business choices work. You gain x for y. If you value no lock-in more than other gain, then that's your choice to make.

(There are multiple services providing S3 interfaces BTW)

Let me clarify - it isn't equivocal that every situation is money savings by moving from data centers to the cloud. I wouldn't have to provide an assessment if it _always_ saved money.
Definitely, but I'm just wondering what the circumstances were where it was actually worth it - I thought my deployment was quite small, but it still didn't seem worth it. I guess due to high bandwidth and storage usage, maybe without those I could get away with it.
You're not including the time costs of operating that server.

Also, most businesses don't have demand that steady. Nor is it advisable in a cloud architecture to rely on a single large server.

Yeah, but I'm just trying to ballpark it. I figured the larger server would be a costsavings over many smaller ones.

Most of it does need to remain online at all times including several large databases. Just running my logstash server alone eats 8 GB of RAM and 200+ GB of disk.

In bandwidth and storage alone I'm past the cost of the server - so even if I could split this up and keep it mostly offline it still wouldn't be close to worth it unless I'm missing something more significant?

> You're not including the time costs of operating that server.

Do you mean swapping the drives about once every 3 years with free DC remote hands? Operational costs there are almost nothing - maybe 10 hours a year of my time. Surely much less than it would take to get everything running on a cloud architecture. And that cloud architecture would still require some level of maintenance and monitoring I'm sure.

> I figured the larger server would be a costsavings over many smaller ones.

Nope, you pay more for a single large machine than multiple small ones.

If your service is truly so stable that you only need to spend 10 hours a year maintaining it (and demand isn't growing), then maybe the cloud isn't right for you. But that's not true for most companies.

Actually pricing at both aws and gcp is linear.
For a single host that you know you'll want in 3 years it probably isn't worth it. If demand grows 10x in the next year though, how does that look? What if the host has a sudden failure? What if the business dies and you want to get rid of the host?
What about the personnel costs? Also, what about the instantaneous bandwidth. If you have huge demand spikes where you need a lot of bandwidth at one time, can your system handle it?
"What if we grow 20x overnight" is basically magical thinking, that's about what it'd take to really cause problems, the odds of it occurring are dirt low. Most of the time it'll lead you to waste money. Certainly not enough to account for nearly that price increase. Especially given that you can still rent dedicated or cloud servers temporarily to accommodate in the event that something like that does happen.

As for personnel, if I had to, I'd hire mostly devops people or pay freelancers for jobs as needed until hitting a limit where you have to dedicate a large portion of someone's time to it and then repeat the math - I fully suspect that this would mean sticking with bare metal. I suspect a 2 or 3 person group dedicating their time to managing only hardware and basic infrastructure for something like Kubernetes could probably handle hundreds of servers without issue. Even if that averages out to an additional cost of $1000 per server per year it'd only be approximately a $80 per server month, nothing like the increase of going to EC2. And renting entire racks gets cheaper than the individual colocation I'm paying for at the moment. It'd surely be interesting

> What about the personnel costs?

I have no idea where people came from when they moved to AWS/etc but I always have (and still do) picked up the phone and got someone on the line who usually fixes whatever the issue is with me on the phone.

As far as demand spikes - even smaller hosting companies will have hardware they can spin up pretty quick.

Actually, unfortunately I think that the price is more like 9 cents per GB, so the calculation looks even worse. The $0.01/GB you saw must be across zones within AWS?

So ~$900/month for bandwidth alone.

The disk and CPU costs for me are worth it, much better to pay $600/month and let that be someone else's headache. But the bandwidth makes this totally a nonstarter.

Out of curiosity, what are you moving?

I'm also not sure how these smaller datacenters somehow charge so much less for bandwidth.

Care to share more details?

On paper, the cost differences between renting/colocating traditional dedicated servers and using cloud service with similar performance/capacity is huge, so it's pretty unusual to actually save money by migrating to cloud. I'd love to hear where the savings actually come from.

My company has done some research on that, in the webscale bubble. It is wrong to think of the divide as "Cloud" vs "Bare Metal" in a VM for server thing. Disregarding up-front investment, you're comparing rented VMs in the cloud with bare metal boxes, datacenter costs, and most important, manpower.

At a small scale, you're paying the cloud less than 2 or 3 competent admins with datacenter and networking experience cost for salary. In that situation, you're saving money, because you're saving the hardware ops team. You're paying more dollars per iop, core and GB of ram, but the alternate method of obtaining these resources has more surrounding costs than "ze cloud".

On the other hand, once you're shoveling several hundred kilo-dollars per month to a cloud provider, it makes sense to throw a million or 10 at dell and hire those operators, because it will save money within a year or two. This only makes sense if you need the resources, but if you need those resources, it helps being more efficient.

Cloud may save from hardware/driver issues - by providing an already tested environment, that doesn't (normally) trip on some weirdness in, say, network drivers.

But if the problem is in the software stack, one would still need competent sysadmins/system engineers with skills to diagnose and resolve the issue. When (just a random example) oom-killer wreaks havoc and free(1) insists there's more than half of physical memory still available, it doesn't matter whenever one's in the cloud or not.

I believe, skilled system engineers are still a requirement for any large project, be it in cloud or not.

> I believe, skilled system engineers are still a requirement for any large project, be it in cloud or not.

And that's why cloud will win every single time, forever, on all metrics.

Because the cloud requires less engineers to achieve the same work, as it takes care of the low level hardware work.

How that's different from hosting provider engineers taking care of setting up a dedicated server and even pre-installing a tested OS image? They also usually handle all the hardware-related issues you may encounter.

That used to work well before the "cloud" era, and still does.

This is why I said hardware op.

If you use some VM-based cloud infrastructure you will still need a bunch of good system/linux admins to make your application work on the linux in the cloud. Without these, you're toast.

Bare metal requires you to have both good system and linux admins, and on top of that, good hardware admins, networks and datacenter hands. These are different skill sets.

So again: If you do the bare metal thing right at the scale it pays for itself, you'll probably end up paying both the skillset for the cloud deployment, and add the hardware skillset on top.

Are hardware sysadmins a different caste? Unless you you mean having own networking (like a router/ASA in addition to the server, or even your own private fiber), of course, and those CCNAs etc. Or those experts on some specialized hardware, like giant FC SANs or whatever one might fancy. If, when talking about "bare metal" we go to those extremities, then, sure, cloud is unbeatable.

I believe, usually, "bare metal" hosting means you order the hardware, get it installed, but networking/cooling/power supply/etc are done by the datacenter people, not your own staff (your own staff may be not even permitted to enter the server room). No less-common specialized hardware to deal with, either.

System engineers must known OS internals well. If they do, it's unlikely they can debug, say, a kernel memory leak (which can be a thing in a VM), but not a lockup in a network card driver's interrupt handler (which is close to impossible in a cloud VM, but I saw this on a bare metal). Maybe I'm wrong, but that would be, like, too specialized and just weird sort of specialization. Am I wrong?

Hey, you are a liar. He is very sure that the cloud saving money has always been a damn lie! To hell with your data.

/s

what data?
"I can tell you with 100% certainty that the savings are there and have seen financial proof of the savings.".

My attempted joke wouldn't make sense outside of this context