| I don't know how NFS keeps coming up. It's an entirely different use case. It doesn't help the credibility of a critique on networked block storage to harp on a vendor specific implementation of a technology that doesn't even operate in the same sphere. An NFS server is very simple. With NFS on it's own VLAN, and some very basic QoS, there's no reason an NFS server should be the weak point in your infrastructure. Especially since it's resilient to disconnection on a flaky network. If you're looking for 100% availability, sure, NFS is probably not the answer. If on the other hand you're running a website, and would rather trade a few bad requests for high-availability and portability, then NFS can be a great fit. None of that has anything to do with EBS or block-storage though. Joyent's position is that iSCSI was flaky for them because of unpredictable loads on under-performing equipment. The situation would degrade to the point that they could only attach a couple VM hosts to a pair of servers for example, and they were slicing the LUNs on the host, losing the flexibility networked block-storage provides for portability between systems. Here's what we do: We export an 80GB LUN for every running application from two SAN systems. These systems are home-grown, based on Nexenta Core Platform v3. We don't use de-dupe since the DDT kills performance (and if Joyent was using it, then is local storage without it really a fair comparison?). We provide SSDs for ZIL and ARCL2. These LUNs are then mirrored on the Dom0. That part is key. Most storage vendors want to create a black-box, bullet-proof "appliance". That's garbage. If it worked maybe it wouldn't be a problem, but in practice these things are never bullet-proof, and a failover in the cluster can easily mean no availability for the initiators for some short period of time. If you're working with Solaris 10, this can easily cause a connection timeout. Once that happens you must reboot the whole machine even if it's just one offline LUN. It's a nightmare. Don't use Solaris 10. snv_134 will reconnect eventually. Much smoother experience. So you zpool mirror your LUNs. Now you can take each SAN box offline for routine maintenance without issue. If one of them out-right fails, even with dozens of exported LUNs you're looking at a minute or two while the Dom0 compensates for the event and stops blocking IO. These systems are very fast. Much faster than local storage is likely to me without throwing serious dollars at it. These systems are very reliable. Since they can be snapshotted independently, and the underlying file-systems are themselves very reliable, the risk of data-loss is so small as to be a non-issue. They can be replicated easily to tertiary storage, or offline incremental backup easily. To take the system out, would require a network melt-down. To compensate for that you spread link-aggregated connections across stacked switches. If a switch goes down, you're still operational. If a link goes down, you're still operational. The SAN interfaces are on their own VLAN, and the physical interfaces are dedicated to the Dom0. The DomU's are mapped to their own shared NIC. The Dom0, or either of it's NICs is still a single point of failure. So you make sure to have two of them. Applications mount HA-NFS shares for shared media. You don't depend on stupid gimmicks like live-migration. You just run multiple app instances and load-balance between them. You quadruple your (thinly provisioned) storage requirements this way, but this is how you build a bullet-proof system using networked storage (both block (iSCSI) and filesystem (NFS)) for serving web-applications. If you pin yourself to local storage you have massive replication costs, you commit yourself to very weak recovery options. Locality of your data kills you when there's a problem. You're trading effective capacity planning for panic fixes when things don't go so smoothly. This is why it takes forever to provision anything at Rackspace Cloud, and when things go wrong, you're basically screwed. Because instead of proper planning, they'd rather not have to concern themselves with availability of your systems/data. It's not a walk in the park, but if you can afford to invest in your own infrastructure and skills, you can achieve results that are better in every way. Sure, you might not be able to load a dozen high-traffic Dom0's onto these SAN systems, but that matters mostly if you're trying to squeeze margins as a hosting provider. Their problems are not ours... |
When you move sqlite to NFS, for example, file locking probably won't work. There is nothing to tell you this.
It sounds like you have experience making NFS work well, but I don't see how anything you wrote addresses this point. In fact I think you're just echoing some of the article's points about "enterprise planning". AFAICT you come from the enterprise world and are advocating overprovisioning, which is fine, but not the same context.