Hacker News new | ask | show | jobs
by smanek 5426 days ago
It costs engineering time to do so. Time that could otherwise be used to build features, better protect against more common failures, attract users, etc.

Amazon probably has ~5hrs/year of complete failure of a region. Figure, conservatively, it would take 3 months of engineering time to protect against that, plus a 'continuing' cost of 1/2 a week per month to maintain that protection. You'd also have to (at least) double your provisioned capacity (which may include a larger ops team, etc). Assuming your servers cost $20k/month and devs cost $100/hr (both fully loaded), we're talking about ~$340,000 to prevent 5 hours of downtime (just for the first year).

If downtime costs you more than $50K/hr, then it might make sense to be that fault tolerant. Otherwise, there might be better places for a startup to spend its (limited) resources.

2 comments

Not to mention that it's very easy to increase overall downtime by introducing all the extra complexity this kind of redundancy can bring.
It costs engineering time to simply choose Amazon in the first place. You could spin up VPSes at backspace or dedicated machines elsewhere for less money, and have a local, reliable, fast hard drive.

Instead, to go with amazon you have to architect for Amazon, not counting on your ECC instances to be up all the time, accounting for their local fast storage going away, or accounting for how EBS, which is persistent, is slow.

The alternative is, get the enterprise version of Riak, purchase dedicated nodes in two data centers, tell them about each other. (no engineering required.)

If engineering resources are the most precious commodity, it seems AWS is the more expensive option.