Hacker News new | ask | show | jobs
by Flere-Imsaho 3 days ago
Probably Proxmox. Veeam support is relatively new.
2 comments

> Probably Proxmox. Veeam support is relatively new.

As a sysadmin of Proxmox, I do not see how it can scale to 40k VMs. The Proxmox folks themselves have seen "~24" nodes in a cluster (theoretical support is higher), so you'd probably need a lot of clusters for 40k:

* https://forum.proxmox.com/threads/proxmox-with-48-nodes.1746...

For such a size (and sticking strickly with open source), XCP-ng could be an option, or OpenStack. In the closed source space, Nutanix.

As of 2021, CERN had 35k instances/VMs in their OpenStack implementation:

* https://superuser.openinfra.org/articles/scaling-bare-metal-...

Proxmox for 40k vm would be surprising also veeam support Proxmox.
I'd would assume that this is not a monolithic cluster of 40k vm's but at least tens of clusters. Which puts it in the realm of capabilities of Proxmox.
Before my vacation we (3 colleagues and myself) completedan 8 months long migration (coordination with stakeholders is longer and more complex than migrating a 192TB VM !!!) to 6 proxmox clusters so 20 to 40 clusters for 40k is certainly possible but imo it would be unwieldy.
> 192TB VM

Why? Honest question, what leads to that Kind of size and why cant it use NAS shares or SAN disks for most of that data? Kudos on the migration!

I wish I knew ... at least it was a span volume so we could use proxmox support for vmdk and achieve about 30 minutes of planned downtime but a week of storage vmotion followed by another week of the proxmox equivalent.

When I get back to work I will finish the samba configuration of the ceph cluster front-ends to replace those elephants.

Would you do it again (ProxMox)?
The FC multipathing was a learning experience and the manual workload placement requires good metrics on your workloads. The built-in ceph is decently configured and more performant than FC if you have 100GBS mellanox NICs and adequate quantity of ram. Veeam integration is serviceable but it's not as mature and polished as the integration available on vmware.

Having tried azure local before (it seems magical but the more you use it the worst it gets, update failed for no apparent reason on only some supposedlyidentical nodes, the sdn was atrocious to deploy and was manageable from wac only), I would recommend proxmox over it anyday.

If you don't have linux expertise on hand and have traditional FC based storage, I would recommend something else, probably nutanix if your budget is big enough.

And if you had to do it one a normal human setup of three hosts and 15 total VMs, no fiber, no san, no special nics or anything like that, just tied in to an ordinary Veeam with the ssd storage on an immutable Linux machine elsewhere?
I’d guess thousands of clusters. They have over 3k retail stores in the UK, so that could be a 2-3 node cluster in every one.

I’ve worked with a few major US grocers on very similar projects (some hardware only refreshes and one VMware to HyperV/Azure Local migration).