Hacker News new | ask | show | jobs
by Cixelyn 1776 days ago
Curious -- any reason you didn't just go with a single machine export + expansion disk shelves on something like ZFS? Installing a MinIO gateway would also act as a bare drop-in for S3 too.

Asking since we're in the same position as yourself w/ high double-digit disks trying to figure out our plan moving forward. Right now we're just using a very large beefy node w/ shelves. ZFS (via TrueNAS) does give us pretty good guarantees on failed disks + automated notifications when stuff goes wrong.

Obviously a single system won't scale past a few hundred disks so we are looking at alternatives including Ceph, GlusterFS, and BeeGFS. From the outside looking in, Ceph seems like it might be more complexity than it's worth until you hit the 10s of PB range with completely standardized hardware?

2 comments

Some of our rendering processes take multiple days to complete, and the blackbox software we use doesn't have a pause button. So it's not that we're in need of 99.99999% uptime, but there's actually never a moment where rebooting a machine would be convenient (or indeed cost us money). Being distributed over nodes means I can reboot them and the processes are not disrupted.
for k8s there is also kadalu btw. which is based on glusterfs, but simplified.