Hacker News new | ask | show | jobs
by winfred 3797 days ago
I worked in large data centers before and I just don't see how this can be done practically. Data centers require quite a bit of physical maintenance.

Every computer design has some element that will render a large part of the design inoperable in case of failure. Either it is a SAN head (even if you have two, the fail over can malfunction), or a switch setup.

Then there are things like failures of simultaneously purchased components (hard drives purchased at the same time, that are worked the same load will roughly fail at the time).

2 comments

Cloud datacenters are not complex heterogeneous mixes of components. There's no SAN head. It's one thing multiplied + some networking gear. Even if a top of rack switch fails they're still not going to yank the box yet because the TCO will be lowered by too much maintenance at this scale. They wait for their maint interval and fix everything at once (or just upgrade the hardware).
Think of a farm of small data center pods with cloud apps. When failure in a pod exceed useful threshold, apps are migrated out to other pods and the pod is retrieved, serviced and returned to its place.

A custom made barge with dynamic positioning gear and a grabbing/coupling system to detach the pod from the subsea grid, lift it, and then re-attach it would make the servicing relatively efficient.

I could see the roundtrip time for a full hardware replacement of a pod being under an hour, conceivably under 10-15 minutes.