Hacker News new | ask | show | jobs
by MPSimmons 987 days ago
I've run Rook/Ceph, and I run Longhorn right now. I wish I didn't, and I'm actively migrating to provider-managed PVs.

My advice for on-prem is to buy storage from a reliable provider with a decent history of hybrid flash/ssd, so that you can take advantage of storage tiering (unless you just want to go all flash, which is a thing if you have money).

If you must use some sort of in-cluster distributed storage solution, I would advise you to exclude members of your control plane from taking part, and I would also dedicate entirely separate drives and volumes for the storage distribution so that normal host workload doesn't impact latency and contention for the distributed storage.

1 comments

Good points, thanks. What makes you wish you didn't use Rook/Ceph/Longhorn?

In a professional setting, and depending on scale, I'd probably rely on a storage provider to manage this for me. But since this is for my homelab, I am interested in a DIY solution. As a learning experience, to be sure, but it should also be something that ideally won't cause maintenance headaches.

Keeping separate volumes makes sense. I can picture three tiers: SSDs outside of the distributed storage dedicated to the hosts themselves, SSDs part of distributed storage dedicated to the services running on k3s, and HDDs for the largest volume dedicated to long-term storage, i.e. the NAS part. Eventually I might start moving to SSDs for the NAS as well, but I have a bunch of HDDs currently that I want to reuse, and performance is not critical in this case.

>Good points, thanks. What makes you wish you didn't use Rook/Ceph/Longhorn?

It seems like my volumes are constantly falling into degraded and then rebuilding. Resizing volumes requires taking the workload that's attached down, and then it seems to take forever (15m+) for my clusters to figure out that the pod is gone and a new pod is trying to attach.

Really, it's a PITA and all of the providers' storage classes seem better than Longhorn. Ceph I had less experience with but very similar problems - long-gone pods held a lock on PVCs that had to be manually expunged, or wait for a very long timeout.

I've had similar issues with Mayastor (another in-cluster storage solution). It's under heavy development, so I've assumed the more mature options were better.

I'm working on v2 of my homelab cluster, and I'm going with plain old NFS to a file server with a ZFS pool. Yes, I will have a single node as a point of failure, but with how much pain I've had so far I think I'll be coming out ahead in terms of uptime.