Hacker News new | ask | show | jobs
by gravypod 2340 days ago
I've seen a lot of the hacker community focusing on btrfs and zfs but very little focusing on ceph. I think ceph has a lot of the features that we want in a file system and some things that aren't even possible on traditional file systems (per-file redundancy settings) with very little downsides. The setup is a little more complex involving a few daemons to manage disks, balance, monitor, etc. I wish there was something similar to FreeNAS for ceph that only focused on making the experience seemless because I think if it became more popular in the home lab space we'd see lots of cool tools pop up for it.
1 comments

I love Ceph, I even wrote an intro about it for those who are not familiar with it.

https://louwrentius.com/understanding-ceph-open-source-scala...

But Ceph is not designed to be a competitor to BTRFS or ZFS. The core vision of Ceph is scalability. If you need petabytes of storage and the performance to scale with it, take a look at Ceph.

I may be totally wrong here, but from what I understand about Ceph, it's not meant as a file system for a single computer. I don't understand the idea of running Ceph on your laptop/desktop. It's possible to run it that way but it defeats it's purpose.

I've build a small lab setup with Ceph:

https://louwrentius.com/my-ceph-test-cluster-based-on-raspbe...

Also, there's the issue of performance, in particular latency. That's a bit of a weak spot of Ceph, from what I can tell. Again, may be wrong. But I found these notes interesting.

https://yourcmc.ru/wiki/Ceph_performance

This.

In fact, it's really common to use a ZFS array on single nodes, and then create a SAN using multiple such machines by layering Ceph on top.

That's interesting, but it's layers upon layers... (RIP latency), I think. Unless it's about just bandwidth and volume, then latency is not that big of a deal.
You don't have to use ZFS snapshots. I haven't run a system like this in production but presumably you choose ZFS because it's flexible in how you configure the arrays (as is say, LVM) and because it supports checksumming.
I never really store data on my local machines anymore. All of my data is either hosted only on, or backed up to, my storage server. I think the selling point of ceph is that every server in my apartment can be part of my storage cluster and data I really want to avoid loosing can be persisted across all of them.

For me latency isn't really a large issue. I read and write everything locally on my SSD-backed desktop/laptop and then sync my files to my storage node via git or rsync or something. For me data integrity and availability are important.