Hacker News new | ask | show | jobs
by linsomniac 891 days ago
We use ganeti and I'm ridiculously happy with it.

When I came on board we were using ganeti for dev/stg and VMWare for production. But the difficulty of monitoring VMWare (we were moving away from SAN to local storage, and doing a RAID array monitor was a PITA) and administering (via Windows GUI, which I had to run via a VM on my Linux workstations), plus the licensing weirdness (clusters of size 5 were a sweet spot, any more shifted the price dramatically).

So I eventually shifted our production to ganeti as well, because it had been so solid in dev/stg. It's all manageable from the Linux CLI, and it works really, really well. It's basically a management layer on top of kvm+qemu+drbd+ceph. https://ganeti.org/

The other popular option, which I ran in my previous work, is Proxmox. It is probably a more comfortable analog to VMWare users. https://www.proxmox.com/en/proxmox-virtual-environment/overv...

2 comments

>Windows GUI

I assume you're referring to the (ancient) VIC? Vsphere has been all web based for a long long time now. It had probably just never been upgraded.

Also I'm curious why the move to local storage, what do you do if a host dies?

Good to know it's got something web-based now. Is the licensing still got that 5-machine sweet spot?

What we did if a machine failed was: design our apps to be resilient. Basically everything we run can survive machine failures via either app design or corosync/pacemaker.

We're a pretty small shop, but we ran an experiment of trying a SAN (an HP of some sort) and every year like clockwork the redundant SAN would fall over and take our whole stack with it. Every year like clockwork HP would say "you aren't running the latest firmware, try this one". Equallogic at another job was super reliable but also was easily twice the price of the HP.

The simplicity and redundancy of local storage has largely been a huge win. We did have a couple of Dell machines where the drive arrays seemed to fall over, possibly because of too much IO, but Dell identified a particular SSD and the array has been solid for 3-4 years since then.

I used ganeti 10 years ago at a company I was at. It was really great then. Glad to see it's still worked on.
It's mostly in maintenance mode right now, but that's also kind of fine because it is pretty solid. I would like a better ZFS storage story, but it does have great DRBD, LVM, and Ceph stories.
Docs page: "Last updated: Jan 4, 2021."