Hacker News new | ask | show | jobs
by jlebar 4020 days ago
To me, this sort of thing brings home the value of not running your own machines. Sure, Amazon's/Google's clouds have quirks, but it's far less likely that you're going to have to debug faulty hardware in this way. It sounds like a team of more than one person worked on this at least part-time for weeks -- how much is that worth? It's not just the cost of hiring extra people to do the work; often small companies simply can't hire enough good people -- when you do find them, do you want to squander them twiddling servers?
5 comments

If something similar happens to you on "cloud" infrastructure, you're very limited in what you can do to diagnose or work-around the problem.

At a place I used to work at we had a reasonably large cluster of Windows boxes on Amazon. Randomly, Windows machines on Amazon would suddenly stop accepting new TCP connections.

This means that machines would be running fine, and then half your cluster starts dropping offline. At the time when this happened to us, there were no other reports we could find of this happening.

Turns out, it's some bug in the Xen Virtual NIC driver that wasn't running the offloaded TCP cleanup, and so eventually the system couldn't accept any new connections. Once we figured out was happening we could pre-emptively reboot boxes, but that was a problem for us for about 6 months iirc.

There's probably dozens of these bugs affecting someone on these cloud platforms at any one time. But because you have no access to the hardware, you don't even have the option of saying "Screw it, lets just get different hardware". You're at the mercy of your cloud provider.

There is no cloud - just other people's computers.

Many use-cases just require the job to be done on your computers due to security and privacy reasons. Yes, Amazon's and Google's services are in some ways less secure than your own computer, because they are hosted by companies which are subject to a government that doesn't value privacy, not even of it's own citizens. That means said government can, just to give a concrete example, NSL the companies to give up all they have about you, and you wouldn't even know notice.

When the government puts national security above fundamental human rights there is something dangerously wrong.

Thinking about individual computers will lead you astray. There are, rather, sets of machines (from single boxes to entire data-centers) that are managed by a given sysadmin staff. The more machines they manage, the more likely it is that problems will have institutionalized and operationalized solutions.

A cloud is just a sysadmin staff with a Sufficiently Large Deployment to have ironed out all the kinks in their hardware.

Or the more likely they'll not do advanced stuff in order to increase profit, as long as there is a microscopic delta better than running it yourself for most customers most of the time on average. The microscopic delta may not be measurable or noticeable by the end users of course.

Assuming their business model isn't assuming an infinite supply of future customers so in the short term as long as revenue per customer exceeds cost of sales per customer we're all good, etc. Support costs that exceed average cost of sales must be beaten down/ignored, otherwise its cheaper to let them go and have sales "earn" a replacement customer.

Finally their sysadmins work for them to meet their corporate objectives of various meaningless metrics which have no necessity of aligning in any way with your own corporate objectives.

> A cloud is just a sysadmin staff with a Sufficiently Large Deployment to have ironed out all the kinks in their hardware.

By that definition, I don't think there are any clouds.

True, by the literal definition. I continue to interpret "cloud" as "that mysterious part in the middle of the diagram which is a clean encapsulation of Somebody Else's Problem that never bothers you"; obviously, there are no true "clouds" (and there cannot be) by that definition.

But people can try, and they can get close; and one can say that something is a cloud to the degree that it manages to fulfill the "amorphous shape in your diagram you don't have to worry about" promise. So there are some 80%-clouds, some 95%-clouds, some 99.995%-clouds, and so on.

The point I was trying to make is that the degree to which a cloud achieves that promise is correlated to the size (and longevity, and homogeneity) of the deployment. The more man-years have gone into taking care of a given server type at a given DC, the more institutional knowledge is ready-at-hand to solve a problem on your machine of that type, and so the fewer issues become emergencies that break out of the "cloud" abstraction to require your attention.

And it was a reply to the parent precisely because a security problem is just such an "emergency" that represents a failure of institutional knowledge: I would much sooner trust AWS's KMS to not leak my private keys than I would trust a machine I was running myself to not leak my private keys. I'm a much worse sysadmin than AWS!

This is true but not relevant to the parent comment's concerns about security/privacy.
Lets do some maths on that claim: AWS: c3.8xlarge with 32 "CPUS" and 60 gigs of ram.

For the machine alone its $1200 a month. Bear in mind its on a shared infrastructure, with noisy neighbours. You'll see about 10-30% CPU steal. In practice you'll see performance about half that of a real machine (from my comparisons)

Then you'll need to factor in disks as well. First things first EBS is dogshit slow. Yes ephemeral disks are fast, but then they die, so you're in the same situation. however you need 10gig networking to get low latency, avoid puncturing the cache etc,etc,etc,

for EBS the maximum IOPs you can guarantee to get is 20,000, and you need 1tb for that.

for the Iops, thats $1300 a month + $125 for the 1 TB of storage.

so a month, per machine it'll be $2625. $31500 per machine, per year.

Every 6 months, you could buy a new machine, which is faster than the fastest EC2 instance + EBS.

Now, the OP stated that they have more than one machine. Obviously one could use reserved instances. However similarly one could negotiate volume discounts.

There is of course the cost of internet and cooling, you're looking at around $500 a month for half a rack, depending on power consumption. (if you're colo'ing)

From a valuation point of view, having hardware counts towards your value, as its an asset you actually own. More importantly you can use it to lower your tax bill, and reduce your run rate, in exchange for an up front cost.

Now, if you have a lot of bursty traffic, that doesn't require much DB activity, then AWS is perfect, as the elastic IP load balancer allows you to spin up machine on demand. However thats not that helpful for Databases. Sure you can warm migrate from a EBS snapshot, but you'd best do it quick, otherwise you'll overload an already overloaded DB.

With our architecture, HW requirements, the price of HW and the price of the cloud VMs, even working on this for a week or two saves us significant amount of money both short-term and long-term. The side effect is that we now have tools to recover servers way faster and allows us to do things we have not thought about before.
Agreed. Additionally, some business models simply don't mesh with cloud infrastructure pricing no matter the volume. There are definitely advantages to using cloud services, but most of the time bare metal gets you more hardware/performance at a lower cost in the long run, even when you factor in everything else that it entails.
The thing people forgets, is that the cloud provider have the same issues and expences. That cost is passed on to the clients. Now they may be more efficient ect. but once you reach a certain scale, and it's less that people think, you might as well get it done in house if you can find qualified people.
How can people forget, when that cost is right there in the price tag? If anything, it's easier to overlook the costs of running your own hardware, since they aren't immediately apparent.
My experience is that people don't understand cloud pricing at all.

First of all they tend to not look at monthly prices, and are seduced into thinking their instances are cheap. Secondly they are seduced itnto thinking they are spending less ops time, though in my experience it's the reverse. Thirdly, people "forget" about extras like bandwidth costs (which are extortionate at all the big cloud providers), extra storage volumes etc.

Then when people get the bill, it often gets back-rationalised as being ok because it's cloud so it must be cheap.

The greatest innovation AWS did was finding a way to get people to pay absolutely insane rates for hosting.

They simply underestimate the ops cost, and often focus on the monthly cost. The thing that cloud providers like AWS are good at, and IMO, the only reason you should choose them, is when you have highly variable loads. Dynamic scaling is something only they can do because they have such a massive scale. Even if you're relative small and cannot justify hiring a sysadmin, there are plenty of consults out there you can hire.
I agree with you. I don't think it makes sense except for very large companies.