Hacker News new | ask | show | jobs
by silverquiet 852 days ago
> Anyway, many people fail to understand that Azure Storage works more like a SAN than a directly attached disk--when you attach a disk volume to the VM, you are actually attaching a _replica set_ of that storage that is at least three-way replicated and distributed across the datacenter to avoid data loss. You get RAID for free, if you will.

I've said this a bit more sarcastically elsewhere in this thread, but basically, why would you expect people to understand this? Cloud is sold as abstracting away hardware details and giving performance SLAs billed by the hour (or minute, second, whatever). If you need to know significant details of their implementation, then you're getting to the point where you might as well buy your own hardware and save a bunch of money (which seems to be gaining some steam in a minor but noticeable cloud repatriation movement).

2 comments

Well, in short, people need to understand that cloud is not their computer. It is resource allocation with underlying assumptions around availability, redundancy and performance at a scale well beyond what they would experience in their own datacenter.

And they absolutely must understand this to avoid mis-designing things. Failure to do so is just bad engineering, and a LOT of time is spent educating customers on these differences.

A case in point that aligns with is that I used to work with Hadoop clusters, where you would use data replication for both redundancy and distributed processing. Moving Hadoop to Azure and maintaining conventional design rules (i.e., tripling the amount of disks) is the wrong way do do things, because it isn't required neither for redundancy nor for performance (they are both catered for by the storage resources).

(Of course there are better solutions than Hadoop these days - Spark being one that is very nice from a cloud resource perspective - but many people have nine times the storage they need allocated in their cloud Hadoop clusters because of lack of understanding...)

I would think that lifting and shifting a Hadoop setup into the cloud would be considered an anti-pattern anyway; typically you would be told to find a managed, cloud-native solution.
You would be surprised at what corporate thinking and procurement departments actually think is best.
The cloud is also being sold as “don’t worry about data loss”.

To actually deliver on that promise while maintaining abstraction of just “dump your data on C:/ as you are used to”, there are compromises in performance that need to be taken. This is one of the biggest pitfalls of the cloud if you care more about performance than resiliency. Finding disks that don’t have such guarantees is still possible, just be aware of it.