| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by loeg 883 days ago
	Why would you bother with a distributed filesystem when you don't have to?

6 comments

nh2 882 days ago

One reason for using Ceph instead of other RAID solutions on a single machine is that it supports disk failures more flexibly.

In most RAIDs (including ZFS's, to my knowledge), the set of disks that can fail together is static.

Say you have physical disks A B C D E F; common setup is to group RAID1'd disks into a pool such as `mirror(A, B) + mirror(C, D) + mirror(E, F)`.

With that, if disk A fails, and then later B fails before you replace A, your data is lost.

But with Ceph, and replication `size = 2`, when A fails, Ceph will (almost) immediately redistribute your data so that it has 2 replicas again, across all remaining disks B-F. So then B can fail and you still have your data.

So in Ceph, you give it a pool of disks and tell it to "figure out the replication" iself. Most other systems don't offer that; the human defines a static replication structure.

link

imiric 883 days ago

For the same reason you would use one in enterprise deployments: if setup properly, it's easier to scale. You don't need to invest in a huge storage server upfront, but could build it out as needed with cheap nodes. Assuming it works painlessly as a single node filesystem, of which I'm not yet convinced if the existing solutions do.

link

loeg 883 days ago

> if setup properly, it's easier to scale

For home use/needs, I think vertical scaling is much easier.

link

imiric 883 days ago

Not really. Most consumer motherboards have a limited number of SATA ports, and server hardware is more expensive, noisy and requires a lot of space. Consumers usually go with branded NAS appliances, which are also expensive and limited at scaling.

Setting up a cluster of small heterogeneous nodes is cheaper, more flexible, and can easily be scaled as needed, _assuming_ that the distributed storage software is easy to work with and trouble-free. This last part is what makes it difficult to setup and maintain, but if the software is stable, I would prefer this approach for home use.

link

matheusmoreira 883 days ago

I'm indifferent towards the distributed nature thing. What I want is ceph's ability to pool any combination of drives of any make, model and capacity into organized redundant fault tolerant storage, and its ability to add arbitrary drives to that pool at any point in the system's lifetime. RAID-like solutions require identical drives and can't be easily expanded.

link

loeg 882 days ago

ZFS and BtrFS have some capability for this.

link

m463 883 days ago

lol, wrong place to ask questions of such practicality.

that said, I played with virtualization and I didn't need to.

but then I retired a machine or two and it has been very helpful.

And I used to just use physical disks and partitions. But with the VMs I started using volume manager. It became easier to grow and shrink storage.

and...

well, now a lot of this is second nature. I can spin up a new "machine" for a project and it doesn't affect anything else. I have better backups. I can move a virtual machine.

yeah, there are extra layers of abstraction but hey.

link

iwontberude 883 days ago

It's cool to cluster everything for some people (myself included). I see it more like a design constraint than a pure benefit.

link

erulabs 883 days ago

So that when you do have to, you know how to do it.

link

loeg 883 days ago

I think most of us will go our whole lives never having to deploy Ceph, especially at home.

link

erulabs 883 days ago

You’re absolutely not wrong - but asking a devops engineer why they over engineered their home cluster is sort of like asking a mechanic “why is your car so fast? Couldn’t you just take the bus?”

link