|
|
|
|
|
by sophacles
5715 days ago
|
|
Thanks! My app was in that datacenter too. I mean, it had replicated mongodb instances, and well balanced app servers, and nodes going away have no discernable affect on users. Turns out tho, that with all that distributed engineering, I didn't find out that the hosting company doesn't put your nodes in various data-centers. That tornado would have taken down my service during peak hours. I know you will try to write that off as a "you get what you diserve" but I challenge you to go ask people if their apps would survive a tornado to the data-center. Many of them will say "sure its in the cloud!" Then drop the killer question on them... "How many different data centers are your nodes running on right now". Most will say "i dont know". Some will say "My host has many data centers" (note this doesn't answer the question). A few will actually have done the footwork. Also, the scenario you describe is as easily mitigated with hot failovers and offsite backups. This probably qualifies as distributed engineering, but only is only the same as the above discussions in the most pedantic senses. |
|
"Also, the scenario you describe is as easily mitigated with hot failovers and offsite backups."
This is a sadly wrong, though common, belief. There is exactly one way to know that a component in your infrastructure is working: you are using it. There is no such thing as a "hot failover": there are powered on machines you hope will work when you need them. Off-site backups? Definitely a good idea. Ever tested restore of a petabyte?
Here's a simple test. If you believe either of the following are true, with extremely high probability you have never done large-scale operations:
1) There exists a simple solution to a distributed systems problem.
2) "Failover", "standby", etc. infrastructure can be relied upon to work as expected.
Extreme suffering for extended periods burns away any trust one might ever have had in either of those two notions.