| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by PaulHoule 4427 days ago
	This happens in AWS too. In the cloud you have to assume nodes will go down and not have it be a disaster if one fails.

4 comments

osteele 4427 days ago

Ditto. It happens with physical servers too. If you can't survive unscheduled degradation or termination of a node, you aren't running a high availability service. (Note: not every production service needs to be HA.)

link

lachgr 4427 days ago

I agree. When you don't run your servers yourself it should be a calculated risk that sometimes your server goes down. Of course, if this happens quite often, you should considering going to another hoster, but it can't be guaranteed that one server has 100% uptime with perfect recovery.

If the website of one of my clients goes down, it's not a disaster and it's fine when it's up and running again in a few hours, maybe a day. I understand it's not nice when this happens, but it's the risk you take with essentially outsourcing your hosting.

link

mnem 4427 days ago

AWS generally, although not always, shows an alert before this happens.

However, I totally agree - if you are running any sort of service where you have a financial penalty if that service goes down, then it's your responsibility to ensure your service's architecture supports catastrophic failure of nodes. 1 machine running all the things isn't high availability and shouldn't be sold as such.

link

coolboykl 4427 days ago

I do agree, We do setup for HA, but due to some bugs on our end, the cut over from our DB Slave to Master didn't happen. Will be more careful next time.

link

dylz 4427 days ago

Except AWS is an actual cloud, and DO is just a few VPS with barely private networking available in half the locations.

link

mnem 4427 days ago

What makes a cloud a cloud and not a vps?

link

lazylizard 4427 days ago

maybe http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145... ?

link

mnem 4427 days ago

That's interesting. So, from that, the thing that stops DO being a cloud service (and linode and so on) is that you can't say "I used 2 CPUs at 100% for 12 hours"? It only allows for the granularity of saying "the node was on for 12 hours"?

link