Hacker News new | ask | show | jobs
by why-el 3793 days ago
> if you're not regularly cutting power to your data center, you're not building resilience to such a thing happening

Would love to read examples on who is doing this and how? Reminds me of Netflix's Choas monkey, only applied to electricity. :p

3 comments

There's a mention of Facebook regularly doing this in the summary section of this instagram engineering post: http://engineering.instagram.com/posts/548723638608102/

EDIT: Here's more info: http://www.datacenterknowledge.com/archives/2014/09/15/faceb...

Awesome, thank you. :)
I remember reading a few years back that Yahoo once a week takes a random data center offline, just to make sure they could do that without issues. They probably didn't actually cut the power ;) But they used it as an argument against investing to much in emergency generators and such: they'll fail or cause accidents and you need the ability to fail-over either way, so make it routine.
I think trying to cut power at least once is better if it's possible. The reason is that digital is just an abstraction over analog, electrical activity. Plus there's actual analog in there doing work, too. So, seeing how all the chips in there respond to an actual and instantaneous drop of the power would be an interesting test of the models they're built against.

Like an above commenter mentioned, weird activity in electrical system can make some products go haywire and even corrupt data in unexpected ways. Of course, simulated takedowns and all appropriate measures for countering common issues should've already happened before a real one. Just to be extra clear there.

Google wrote an article about disaster recovery in 2012. https://queue.acm.org/detail.cfm?id=2371516