Hacker News new | ask | show | jobs
by swombat 6522 days ago
According to the email I got from them:

At 12:15 PDT today, we experienced a failure of our switching infrastructure on cluster EY02. We are incredibly sorry about this issue and wish to convey all of the information we presently possess.

The result of this failure was blocked IO from some slices to the SAN shelves. Based on the logs and diagnostic output available to engineers, it seems likely a software issue with one of the modules in the switches is at fault. We have captured the necessary log entries from the switches, and we are working with the vendor to identify the cause of the errors. (In case you are curious - our switching infrastructure utilizes 4 stacked switches, all interconnected for full redundancy.)