|
|
|
|
|
by bcrl
638 days ago
|
|
FC is not entirely lossless. One ticket I had the joy of dealing with involved a customer using a Fibre Channel network for their storage using multipathd for failover. In theory it was a fully redundant configuration with dual FC ports on the server with each one going to a different FC switch all the way back to the SAN. However, the system was generating I/O errors on large writes while small writes would succeed. Needless to say that ext4 failed horribly, and there were worries that it was a kernel bug in the FC driver. After a good amount of back and forth with the customer, and several test programs run on the system in question, I eventually came up with a hypothesis that there was an error in the write path of the SAN as small writes succeeded while larger writes failed. The customer ultimately found there was a dirty fibre on one of the links in their FC fabric. It was dirty enough to corrupt large packets, but not so dirty that smaller writes and control packets were unable to get through. Since multipathd only checks to see if a given target can be read from, it would never fail over to the other path (which was fine). So much for trying to build a high availability system using an expensive SAN! Lesson of the story: what you think is a lossless network is not always lossless. Using the IP stack has a lot of beneficial diagnostic tools that you really start missing when something goes awry in a non-IP network. |
|