The Broadcom business model (outside the chip business) had been pretty well known, and they don’t really hide it.
They are tech bottom feeders. They find large businesses with a decent moat and free cash flow but are in long term decline (and wasting cash trying to find something new). They buy them, cut development, support and marginal products. Raise prices and squeeze as much as they can.
It’s the same modus operandi as private equity but worse, because Broadcom has the money and technical resources to do interesting things with the technology, but they don’t.
Broadcom is publicly listed with a public float of about 98% (i.e. 98% of it's shares are listed publicly).
You're right that most shares are held by institutions (~80%), but that typically reflects the fact that most share ownership by individuals/companies goes through intermediaries (401k, fund investments, ETF etc.). Most of this institutional ownership is just asset managers, insurance, banks etc. taking their cut before passing returns/loses through to the end risk holder. The average institutional ownership of companies in the S&P500 for example is also ~80%.
None of this takes away from the point that Broadcom is absolutely run like a PE firm as the original commenter noted.
Not surprising given the CEO was appointed by KKR/Silverlake 20 years ago.
They do keep investing in their software "just enough" to keep customers from churning.
These sorts of lawsuits never really make it to court, there's a negotiation tactic.
Meanwhile Broadcom's software revenue led by VMware keeps growing 30% YoY and they close new contracts because in spite of being expensive.... because there's very few true alternatives in the market to VMware Cloud Foundation at high scale - Nutanix, Proxmox, Azure Stack, and OpenShift all exist but have their own problems. I've worked with them all, and they're all... big, expensive and difficult, though VMware probably is the most stable and hassle free of the bunch. Just costs a lot.
Revenue may be increasing, but their customer base is decreasing, and any customer who's paying attention is now looking for an exit strategy.
Yes, the alternatives have their problems, but ESXi/VMware/VCenter/VSphere have a lot of problems too. I will disagree with your claim that "VMware is the most stable and hassle free of the bunch." I spec'ed, installed, and ran a VMware cluster for a few years and it was never very stable. After a while I stopped installing the software updates because they would usually break something.
More than once, after applying an update, I had to re-install the licenses for each server and its associated CPUs, which is a painful process. We initially installed using an external DNS, but the cluster was so flaky that we had to switch to their recommended configuration of local DNS. There was a never ending stream of security vulnerabilities, so you were highly incentivized to patch, but it never got any better.
> Revenue may be increasing, but their customer base is decreasing, and any customer who's paying attention is now looking for an exit strategy.
VMware in 2024 had 96% of the virtualization market and 500k customers. The above is somewhat true but also kind of like saying "the USA is in decline"... okay, but it's so big that it's going to take a very long time, and not every arrow is pointing down.
Broadcom focusing on higher margin larger customers hurts the 10+ year horizon but at the same time they're closing massive (9 figure) deals 5 years out including some very large expansions. Everyone is going to look for an exit - as they should! - but that doesn't mean they actually WILL exit.
(I don't work for VMware, though I did years ago, I am an independent).
> I spec'ed, installed, and ran a VMware cluster for a few years and it was never very stable. After a while I stopped installing the software updates because they would usually break something.
I would gently suggest this isn't really much of an anecdote. This is like saying "I ran Linux once and it never seemed stable, so I stopped updating it".
There's VMware customers that range from a dozen VMs on a cluster to literally hundreds of clusters each with 10-20 hosts each with 100+ cores and 2 TB+ RAM and thousands of VMs... adding up to 500k+ VMs at the largest customers.
> More than once, after applying an update, I had to re-install the licenses for each server and its associated CPUs, which is a painful process.
This is not something I have encountered in 20+ years or could find an KB about online to indicate it was a widespread issue (though maybe it was if you have a link). Broadcom moved all licenses to subscription recently, which caused issues here but otherwise this feels odd.
> We initially installed using an external DNS, but the cluster was so flaky that we had to switch to their recommended configuration of local DNS.
I am not sure I understand, What do you mean by local vs external DNS? I am familiar with Kubernetes clusters having local CoreDNS and a plugin for plumbing external records called External DNS, however these aren't vSphere concerns. Vsphere uses standard NTP and DNS and doesn't ship with a DNS server, it doesn't have any recommendations on where or how it runs other than it being highly available.
It's a long story: OpenShift was a bad product all around until 2019 with v4, the 3rd rewrite, but that product was a home run. That in itself was an incredible turnaround, even before they moved away from Openstack and turned Openshift into also a VM platform.
Mostly the other problems are the typical problems of managing bare metal multi-tenant Kubernetes cluster. The customers that don't have as many of these problems are ironically running openshift on vSphere ;).
while the OCP operators and GUIs cover much of the usual day to day , you really need deep Kubernetes expertise at scale, and need to drop down to the upstream project code and docs. For example it is very hard to force configuration discipline on tenants (leading to many flowers blooming here like Kyverno); security in Kubernetes is complex and requires careful tradeoffs on policies; it is laborious and counterintuitive (requests vs limits - ie. you should always set requests and be very careful setting limits) to manage compute capacity and noisy neighbours, Submariner and OVN-Kubernetes network services are limited compared to HCX+NSX (eg. NAT topologies, distributed firewall management, tunnels, fabric connectivity ie. VRFs or EVPN support though this is coming soon... also Openshift's metalLB for ingress load balancing is its own thing with its own connectivity config), out of the box observabiity is not very good and requires 3rd party solutions or extensive customized configurations , and the Kubernetes scheduler itself is focused on efficient bin packing rather than workload stability.
Also replacing vSphere VMs with OSV, you lose DRS which is a big blow... you do keep vmotion live migration equivalence but you must use a NetApp Filer (or any NFS store) for your VMs, or Nutanix Files, or ODF/Ceph in RWX volume mode. ODF/Ceph is more laborious to manage than VSAN (it requires its own knowledge well), but importantly has native S3 object storage, which VSAN still is missing (though I hear it is imminent in VCF 9.1.2). VLAN assignment to VMs with NMstate and multi-NIC failover has gotten better here over the years with OCP though feels shakier (more complexity is exposed, LACP is required, etc) than the VMware distributed switch native load based NIC teaming or NSX.
Overall if you squint, OpenShift can replace much of vSphere on paper , and at least somewhat in practice - but you really, really need a sharp ops team that knows what they're doing and at least some 3rd party solutions for capacity and observability. I'm also not sure redhat education and consulting is scaled at the level required to build these skills in industry quickly enough, though IBM certainly has the qualifications to do so. That said Broadcom is also doing plenty to squeeze or shed its education and consulting to partners which is ... a mixed bag usually at first that doesn't end well, and leads to repatriation.
Being told you are carrying a scorpion across a river is not a lot of help halfway across.
I think however this mostly speaks about how hard it is to run anything even vaguely securely. There are so many independant CPUs and OSes on one motherboard plus HDD these days …
And it has no US comparison. The tesco meal deal concept, the literal wall of choices, just doesnt exist in north america.
I did a big work trip to the UK a couple years back with over 100 people. I tried to explain meal deals and nobody believed me. Then our people basically stripped the meal deal shelves of the tesco express beside our hotel.
Meal deals are in every supermarket in the UK. Petrol stations even do them.
Also, as a foreigner who lives over there, I think they are... sad? I'm surprised they got a positive reception from your coworkers. For me they are a backup and a failure to do something more interesting.
What people don't realise is the startups here in the UK run on miserable sandwiches, tasteless crisps and energy drinks. Middle management lives on slightly more expensive platters from Pret.
Unless you live in bumfuck nowhere theres zero reason to be subjecting yourself to a supermarket meal deal, we've got an overabundance of independent food places in towns in the uk.
This is one of those things that varies by cultural cachet rather than actual quality. It's not that different from people living off Japanese konbini, but those are perceived as much cooler.
Most cities will have local sandwich options as well near major office districts, but they might not be as cheap.
One supermarket close to me, has a wall of various pre-prepared lunch items and I think there's a deal if you get the soup as well.
Another chain shop has a smaller selection of items... But does have the '5$ cluck' on thursdays, i.e. a pre-cooked hen for 5 USD. Grab a bag of good add-water mashed potatoes or some corn and/or perhaps a veggie, and you can get a proper dinner for at least 2, maybe 3 people out of it for under 15$.
A sandwich, bag of crisps and a drink for £5 is an actual deal. Sandwich alone in U.S. would be $10 and the “$15 Meal Deal” just doesn’t have the same ring to it.
Gosh, it used to be £3 not that long ago. About £5 for a wrap at Prêt if I couldn't be bothered to go fight with the tourists to cross the road down Kingsway.
There's a lot to complain about in the UK, but food price/quality is actually pretty good. Not the absolute best, but far from the worst and certainly not Scandinavian prices.
Mmh, you can get 3 el cheapo sandwiches for 1.99€, a 100g bag of chips for 0.99€ and a liter of water for 0.90€ or flavoured /coke for 1.99€ in Germany
Considering a £ is more then a €, supposedly at last - it doesn't sound like a good deal to me
Incorrect. Maybe you familiar with the high cost of living areas. There are similar $5 deals in the United States. The US is a big place and has many, many businesses offering very similar deals.
If you want a shitty sandwich you can find it for $5 in the US no problem. Plus some variation of the sausage roll that will clean you out just as well.
The "main" has expanded to Huel, salads, wraps, sushi, even hot food
The "snack" can be more than crisps: small bags of fresh chicken, 2 boiled eggs, small sushi pack, gyozas etc
The "drink" includes quality smoothies, acceptable vending machine coffee etc
Meal deal value maximizing is the whole game lol. There are also lots of healthier options if you choose carefully
In certain Sainsbury's you can get hot food as the main such as a small green curry or chicken goujons, and wedges or hash browns as the side
But the price creeps up £0.50 practically yearly. I think it's £5.50 already in Sainsbury's
It's better to view it as a cheaper alternative to eating at a restaurant rather than somehow saving money compared to bringing in leftovers. People who think £5.50 a day for lunch is saving money versus cooking themselves are delusional
$6.5 is about what you'd spend to get a bag of chips, hot or cold sandwich and drink from any Walmart that's been renovated recently enough to have a "Grab and go" or whatever they're calling it.
A sandwich, a bag of crisps, and a drink at the grocer near me is $8. I don't exactly live in a super low cost of living area, nor is it one of the most expensive in the US.
Not ashamed to say that visiting the UK again for the first time in about 12 years, getting some Boots/Tesco/Greggs meal deals was on my to-do list. Something I missed moving to Australia. Not that they are _good_, just that they are so readily available, cheap, and have a lot of choices. Woolworths in AU have started doing premade sandwiches, but they are just bad, and don't come in a deal. I doubt they would even try to hit the A$10 price range if they ever did introduce deals.
Garment section is also amazing, british ppl are so classy while having timeless thick pieces
I ended up flying back home with some oxford shirt from the Tesco, and it's really cool (vneck pull over - tie - shirt sets were sold out with my size unfortunately)
In really grinds my gears that the buying companies take out the debt to take over against the companies themselves.
So many well-known UK companies have been sunk by debt interest on loans taken out to acquire said companies.
By all means use the companies to secure loans, but the liability should be on the books of the parent companies not the companies being acquired!
There have even been cases where the companies have been effectively asset-stripped by "sell and lease back" of property, leaving the companies a shell of their former selves with no meaningful assets, so as soon as there are any unexpected headwinds they collapse.
It's the bank's problem. The bank is supposed to determine whether it's likely to be repaid in full, and if not, don't issue the loan which blocks the sale.
I worked in software acquisitions for a large organization and it was really eye opening to see how insane some of these companies are when it comes to pricing customers out. I always wondered - what is the motive? They make pricing structure changes that aren't even considerable for any organization that has any fashion of a budget. VMWare was one example where our already insane costs that had nearly tripled over the previous 4 years were quoted to triple at the end of the period.
Another was a Java SE licensing change that went from around $1k per instance, of which we had about 5. Mind you there is little to no maintenance support provided here. The increase was to $5.25 per organizational employee per instance, whether they used the instance or not - of which we had 100k. The choice was obviously a simple one.
I can only assume very few organization stay on the ride for those kinds of changes, but obviously they must - but why?
It might take a large org several years to migrate off core systems like VMWare. If you think the customer is likely to churn within a few years anyway it makes economic sense to hike their fee.
At any one time, something like 90% of all enterprises are engaged in at least one multi-year strategic move away from an abusive vendor. In the tech world, these might be Oracle, Broadcom, (formerly) IBM, or (even more formerly) Computer Associates.
Typically you're looking at a year or two of discovery, audits and planning, another year or two to cover the main transition, and then up to five years of mopping up.
There are other near-ubiquitous vendors (eg. Microsoft and Cisco) who manage to be tolerated as annoying rather than outright abusive. I guess they take a slightly different view of how hard to squeeze their customers.
I did a gig at a Fortune 500 that had actually succeeded in entirely eliminating Oracle. Life was still miserable.
They lived in fear of something slipping through the net. So print servers were switched off because they contained an embedded Oracle JRE. And deployment pipelines that used Hashicorp's Packer had to be rewritten to eliminate the VirtualBox plugin (despite it not being used). Office coffee machines were looked at with suspicion.
Every vendor had to be queried, every piece of software had to be tested and have appropriate controls put in place. There were pre-emptive audits and endless compliance procedures.
There was so much work involved that any cost savings must have been fairly minimal.
It's hard (=expensive) to change all the internal infrastructure or sometimes even internal processes, and if companies manage to stay just a bit cheaper than their custumers cost to rewrite "everything", they'll get the money. Even if some customers do so, with the price hike, they still earn more from the ones who don't.
Our MSP refuses to consider ProxMox because “we need to support it”… but are happy as clams to throw me outrageous HyperV labor costs.
They’re literally putting me in a position where I either need to fire them because they refuse to use an open source solution and hire people that can read code… or fire them because they want 50,000 to move 15 VMs over to HyperV.
I want an MSP that isn’t scared of things outside of Microsoft.
It's honestly been easy as hell to support and upstream support contracts are available for cheap as chips, like a grand a socket a year or less. An MSP could partner with them to offer enhanced support if they were smart! It's just KVM / Qemu / Ceph on Linux, plenty of 3rd parties can provide support for it... just go spin up a cluster in a trio of VMware machines with nested virtualization turned on and take it for a spin.
Having done this to a number of clusters, it works really well; TLDR you can have it as simple as mounting the remote ESXi as a datastore and just migrating it straight over in the UI. Or automate it if you have 40,000 of them as Tesco does, though tbqh I'm not sure Proxmox is the solution for that scale of workload.
> Tesco is also dealing with migration challenges related to data security because its new, unnamed virtualization software is incompatible with the Veeam and Zerto products it uses.
What is a VMware alternative, that isn't compatible with backup software? I'm guessing it's not nutanix?
I'm having flashbacks from the late 90s/ early 00s when your company would hire a "Linux guy" that would force a large scale migration to some open source stack no one heard of, then only later worry about if any existing applications worked.
Currently in Finland, a major public health provider is moving to chromebooks. By the end of 2026. They won’t even have the test environments ready before Q3 2026.
> Probably Proxmox. Veeam support is relatively new.
As a sysadmin of Proxmox, I do not see how it can scale to 40k VMs. The Proxmox folks themselves have seen "~24" nodes in a cluster (theoretical support is higher), so you'd probably need a lot of clusters for 40k:
I'd would assume that this is not a monolithic cluster of 40k vm's but at least tens of clusters. Which puts it in the realm of capabilities of Proxmox.
Before my vacation we (3 colleagues and myself) completedan 8 months long migration (coordination with stakeholders is longer and more complex than migrating a 192TB VM !!!) to 6 proxmox clusters so 20 to 40 clusters for 40k is certainly possible but imo it would be unwieldy.
We have migrated from VMware to Nutanix, well, over half way there from a 8,000 VM pool. Nutanix has been a royal pain. LCM updates fubar'ing, Firmware upgrades screwing up UUID numbers of disks so CVM's won't boot, VM's not handling vlan nic and vxlan nic together, our SAN's just now being supported, the amount of bug fixes is weekly and impossible to keep up with, a team's ddos'ing of us with their app caused distributed storage to reboot every VM in the cluster, etc etc etc......I hate Broadcom but Nutanix has been painful!
I don't love their pricing but it's not brutal like others I've seen. We run 3 clusters - one of which is an old legacy cluster that was our original and despite Nutanix not supporting it officially (meaning we don't pay for support on it), their support has still been helpful with it and like the other 2 clusters, uptime/stability has been rock solid.
Prism Central has definitely gotten better with the UI since the earlier days. I still prefer Prism Element in some cases, but overall it all works pretty well.
We use HYCU for backups and while I was really skeptical about it in the beginning, it is absolutely solid in a Nutanix environment. Overall we are happy with Nutanix.
It is, I was thinking more of the core of it. In the same way one might say modern Openshift is rebadged kubernetes, but I will admit it wasn't best word choice
I work at Red Hat and a customer moving 40k servers off VMware is a fairly regular occurrence. It'd be one of the larger migrations but certainly not unusual. We can usually do about 500-1000 guests per day once the migration is fully underway after the initial engagement and a qualification period where the VMs get scoped for anything unusual / difficult to move.
It's all based around open source projects virt-v2v and Migration Toolkit for Virt, and the typical target is OpenShift Virtualization.
There are various zero-copy options if you're using specific storage. In the best case the downtime for each guest can be as little as a few minutes. If the storage stars don't align then it can take a few hours per VM (but conversions happen in parallel, dozens or hundreds at a time).
[I don't have any specific knowledge about where this Tesco account is going. We have plenty of competitors. Everyone is dining at the Broadcom trough right now. Broadcom's "strategy" is absolutely baffling to me.]
>Broadcom's "strategy" is absolutely baffling to me.
I know plenty of Enterprise customers who cannot move easily and just renewed 3 year VMware licenses for their cluster at insane rates. They are planning on moving but I'd be shocked if they complete it. $LastCompany had VMware footprint I know will be very difficult to move off, deployments, monitoring, backups were all dependent on VMware. There are plenty of US Government entities who are not even considering it at this time.
If you look deeper into the migration article, it's pointed out that they are already facing migration challenges. I wouldn't be shocked if 3 years later, there are some workloads still running on VMware, you can't easily get them off and just renews insane licensing cost for much smaller hardware footprint.
The extortionate renewal rates I saw as a gift from Broadcom. It made it very easy to price the risk of doing nothing and be sure that the cost of outages during and immediately post-migration would be lower. (Yes, we had a few, due to obscure drivers issues or an app that really wanted a specific CPU or chipset or virtual NIC, and they cost us less than 10%, probably closer to 5%, of what the proposed renewal would have cost.)
Yeah I'm at a place that is kind of sucking it up, but there is a work-stream to move more stuff into the cloud and another work-stream to move more stuff on-prem but Kubernetes running on bare-metal. There's also work to stop using some component of VMware as well.
I think Broadcom correctly realizes that no matter what they do there is no long term: In a world of Cloud hyperscalers and containerization, the absolute number of “traditional” virtual machines run by a commercial hypervisor has nowhere to go but down.
> Broadcom's "strategy" is absolutely baffling to me.
If one believes that they intend to get new VMware customers, or that they intend to have more than single-digit numbers of customers on VMware ten years from now, I can see how that might make their strategy baffling.
They appear to have made a lot of money doing what they're doing, so it looks to be working quite well for them... regardless of what the public or their former customers think about it.
I think your assessment is pie in the sky. I am moving hundreds of VM's per day and the amount of anchors attached to the source VM's is ridiculous. LB's that are mapped via VMware object, VLAN's extended for the migration but not working, SR-IOV enabled, etc etc....you may have the most perfect setup here but in real life I've never seen it that simple so I truly doubt what you're saying.
Openshift Virt is fully open source under a BSD license, so you now have legal options to move to a competitor or even manage it yourself (although I wouldn't recommend the latter, even I don't manage OSV myself).
No one is arguing that upstream Kubernetes pieces aren't open source. OpenShift is not open source. If it was, why does RedHat sell licenses for it, gate the binaries, and not share the byte-for-byte reproducible sources for said binaries?
Lots of orgs have been documenting their moves to KubeVirt over the past year or so. There's KubeCon video recordings on the youtube channel from Amsterdam with lots of this kind of stuff, especially from european end users.
One thing I find consistent is orgs are also looking at the whole stack, this is just another major component of digital sovereignty.
Disclaimer: work for CNCF on this but worked on the first version of VMWare Tanzu so every announcement in this space is interesting lol.
Not just farmers, they are somewhat abusive towards their customers as well. It's been good watching Aldi/Lidl enter the market and put pressure on them.
What other reasonable choice exist for moving off VMWare for a small to medium size organization? Nutanix and Citrix is just as expensive, and another platform capture. ProxMox is not ready for Enterprise, even as it gains traction from hobbyists. I work with Splunk, and its price is approaching the point of being unaffordable for most organizations. The logging and observability market is consolidating toward BigCo and I'm afraid eventually there's not going to be any choice left, for small consumers and small players. The answer can't be "build your own" for every little adjacent tech you need to run a shop.
Good. Broadcom buying them was the death sentence for VMware. If it can't be reversed, the next best thing is to hasten it along. Nobody should be giving money to the likes of Broadcom.
I think it's a cultural thing over there. They came loaded for bear with their customers to convert them to the new ARR-based, non-perpetual product lineup.
I've negotiated a lot of contracts and renewals. I've been threatened twice - Oracle, and then Broadcom. We had perpetual licenses, but that didn't matter, according to them we were out of compliance and as a "courtesy" they delayed sending C&D as a precursor to suing us - this was the intro meeting call. There was no budging on price, and they actually priced the cheaper alternative we could have considered ("VVF") at like a 0.1% discount from their core "VCF" product, I think as a fuck-you. It was a great time, our reseller and I shared a drink over that one.
It does seem an odd move. No doubt they're going to milk existing customers for everything they're worth, but they're going to create a generation of people who will never buy anything from them ever again. That guy who's busting his balls to migrate off VMWare because of the price hike is gonna be the CTO in 10 years time, and when he's making that 10m USD purchasing decision they're gonna stay well away from anything with the name Broadcom on it.
I dont think that's true, a lot of the database market that exists is basically "we're not oracle but we did a thing they can also do for much cheaper"
Broadcom expects every customer to move off VMware eventually due to technology shifts, by jacking up the price 10x and cutting costs 70% they can print money for a few years from customers that are either too risk-averse or too dysfunctional to switch to another product.
Possibly they’ll do enough brand damage that it turns out to be a negative ROI, but for now they’re printing money.
I don’t know how anyone can afford these migrations especially for production on prem workloads without building literally duplicate sets of hardware clusters then manually migrate workloads.
We usually reuse the VMware hardware and (most importantly) file storage. Some additional hardware is required temporarily so you can build out initial Openshift nodes. The VMware nodes are decommissioned and converted to OSV nodes as the conversion goes along. With some kinds of file storage (cough NetApp) the conversion is zero copy, the VM literally stays where it is. With others we will copy to new NFS storage areas which will be provisioned on the same physical hardware.
It's a very scalable and almost fun task once you get into it.
Alternatives to VMware can run VMware VMs almost immediately, by translating the configuration and with only a few (or sometimes no) changes to the guest. Usually those changes are scriptable. I've done it a few times, moving between VMware and KVM of Windows guests pretty much just worked; the rest was optimisation, i.e. guest driver changes, etc.
Live migration is not realistic between different hypervisors, but a very short downtime per VM is realistic if the new hypervisor can adopt the old disk images directly, which some can. If you want, you can convert formats in the background while the VM is running on the new hypervisor. E.g. KVM and things built on KVM can do all these things.
So to each guest, it looks like a quick reboot with a quick hardware upgrade.
If that's coordinated properly, with a generic HA or Kubernetes setup, there's absolutely zero service downtime (if there are no serious mistakes), as it's just nodes within a cluster taken down one at a time while the others keep the services running, and state migrates among the nodes which are live.
Most of the things you'll change when migrating are the same for large numbers of VMs that are configured the same way except for their disk images, and easily minor things like MAC/IP. So after you've verified a small number, you can go right ahead and script the migrations for another thousand VMs, even doing them in parallel.
You don't need to migrate all VMs at the same time, and you shouldn't do that anyway. So the temporary hardware / cloud cost can be in the low single-digit percentage (for a few weeks to months at 40k VM scale, a few hours to days at 10 VM scale). You probably have some slack in there already, though, so might not need any additional hardware.
The 40k servers are probably made up of multiple redundant vSphere clusters with failover. You simply take one of those redundant clusters and migrate one half of it over. Then the other half. Then duplicate that process. As you build more compute in the new stack, you can decomission more and more of the old stack and convert it. The transition would progress like a cascade, with larger and larger groups of clusters being migrated at once until you're left with the one-off, ad-hoc, weirdo clusters at the end that need to be manually migrated (usually with great effort).
The actual hardware servers are clustered together into pools of resources. The pools are where the VMs live. The bigger the new pool becomes, the faster you can empty the old one. So the migration starts very slowly, ramps up quickly, and then tapers off.
> You simply take one of those redundant clusters and migrate one half of it over.
For that half you are migrating, you are essentially operating without redundancy. If these are serious production workloads, the tradeoff is not as simple as you make it seem.
The way a cluster works is you have a giant pool of resources. Say, 33 - 50% larger than the workload. The workload is a dozen VMs. The cluster is 8 giant compute servers and two giant storage servers acting as one giant compute and storage unit. For redundancy you have extra clusters laying around with no workload, but they are added as failovers.
Normally, if one server on a production cluster goes down, the other members of that cluster seamlessly will take over. This is where the extra capacity comes in. You don't migrate the workload to another cluster. You just lose overhead capacity. If you lose too much then you start migrating parts of the workload to the failover. Not the entire thing.
You usually don't have to use your redundant cluster at all until it's time to rebuild the failed cluster. You might pick one of these spare clusters you keep around for redundancy to migrate all or part of the production workload to while you fix the production cluster.
When doing a big migration you take a percentage of your redundancy and convert it to the new environment. This is your staging environment. Once it is capable of doing work, you slowly grow it out and shrink the old environment at the same time.
This is basically how HA works with VxRail. I buy more VxRail than I will actually host because if a node fails then the VMs can be moved - sometimes not always without downtime but no loss. If I run out of HA nodes or start running low on capacity, then Aria will start sending alerts.
Ha I have done migrations recently from vSphere to vSphere using vMotion and it was easy.
But it still took duplicate set of HW and I couldn’t imagine doing it without a lot of IaC and automation in place (plus physical space, power and cooling)
40k server workloads… what a good, objective, quantifiable unit of measure. I’m sure the author’s intent was to make it sound like 40k servers or virtual machines. Either way, those numbers would be insane. Sensationalist clickbait headline.
Could very well just have been that last stubborn server they just never got round to!
They've got a little over 5,000 stores. Probably a few different offices. Each store probably has several VMs or so so they can continue to run isolated if their connectivity to the mothership disappears briefly. You'll probably want the basics of running some kind of local auth service (maybe AD, maybe something else), your system running the POS platform which might even be an app server and a database server as separate VMs, probably some VM running various building management stuff, a VM to run the security camera platform, I dunno what else.
That's just for the operations of the local stores and we're now at probably >25,000 VMs and we've only touched the retail locations. We still haven't addressed logistics locations, corporate offices, stuff to manage their customer-facing applications and websites, etc.
When I first saw 40,000 VMs I too thought it was a bit excessive, but when you're an org wit several thousand locations that you want to be somewhat self-sufficient things add up quickly!
Online ordering. Backend logistics (like icebergs, most of a supermarket is invisible). Stocktaking. Financials. They've probably got several role-isolated servers per store, each with a backup.
Is it that hard to imagine? They do 100B USD / yr revenue as a supermarket chain with 330k employees and a massive logistics operation. The software supporting the whole shebang is not gonna run on a spare macbook pro in a cupboard.
You motivated why they need a big database. Still unclear to me why they need 40k VMs and not just 100.
Its ok for Amazon to do it since they paid for the physical machines anyway and they want to dogfood their AWS services, it does not make sense for someone who rents compute and licenses.
How do you imagine cash registers work these days? With 5,000 stores worldwide, 40k servers is 8 computers per store, which doesn't seem excessive to me.
5000 stores, let's say 10 checkout lines per store, just to overprovision, so at worst 50000 simultaneous transactions going on, but probably way less. You can do that with a single server, but you'd want some redundancy and spare capacity.
I worked with a Danish retailer with +3000 store in ~50 countries, and even adding their webshop on top and they were closer to 200 (maybe 300) servers (most VMs). Then you need the ActiveDirectory, office IT, all that stuff, with redundancy and it adds up quickly... but not to 40K.
What I will say that people forget is that production might be 8 beefy VMs, you still need to replicate that to a number of test systems, staging environments and so on. So a 8 node production cluster because maybe 24 servers when accounting those other environments.
I'd assume most of the big supermarkets have a 4-5 host cluster with the small local stores having a 2-3 host cluster. You've got the software to run the tills sure, but also the loyalty card system (which seems to have a local cache at each site based on how quickly it returns your first name), VoIP, Door Access systems, BMS, Digital Signage, Scan as you shop systems, CCTV, Stock management systems..
I imagine that cash registers have the cpu power of at least a cellphone and that they can store transactions over internet in the central company database.
at Tesco scale you really dont want a central database even if its big and high availability. if one store loses connection because of isp issues it would shut down everything and make it impossible to serve customers in that location. a global outage would cost billions. latency is also a big deal when you got a crowd lining up in front of every checkout lane and the self serve machines, at that point it can even turn into a safety/liability issue.
and dont forget that different countries have different rules about customer private data and payment information. if you send eu customer info into the uk for processing you might be breaking privacy laws. some places do automatic tax reporting where you need to send info to a country specific tax office api, get back a code and print it on the receipt.
cash registers dont work alone. its connected with inventory management, employee perf monitoring, payment processing and other things that make more sense as a store local service sending regular reports to corporate instead of waiting for round trips on every operation.
Lots of these machines are dynamically created and provisioned depending on load factors nowadays.
The days of manually setting up servers in hyperscaling-environments are long long gone.
Example: Your GitLab CICD needs Runners. They are dynamically requested "somewhere in our cloud somwhere in the world" and then spun up and configures fully automatically. No human touches this stuff anymore.
As someone who has never dealt with anything close to this scale, why would it take 18 months to migrate? Is this poor config management, a lack of automation, or something else?
I work in automating conversions off VMware and 40,000 VMs is just a lot of data to move. We could probably do 500-1000 / day which would be 3 months, but that would be best case, and there's a lot of prequalification where you examine classes of VMs to check what software they're running and identify the unsupported / difficult cases. That planning would add extra months.
In some cases you can do zero copy conversions, so downtime can be done in a few minutes, but it relies on the customer have very particular storage configurations (NetApp basically). In other cases there can be significant downtime that needs to be scheduled. I worked one case where the customer shut down several production lines over a number of weekends so we could convert the workloads. (Everything was meticulously planned, along with fallbacks that thankfully we did not need to use.)
Some things you don't convert at all. Databases generally get replicated at DB level to new hardware. Single-purpose appliances need to be reprovisioned by going back to the vendor and asking for a KVM equivalent.
Then there's all kinds of craziness, like we had customers who rolled their own backup solutions where we had to add special cases to the software to detect and ignore the backup partitions. Or people running Windows 95 or RHEL 3 (for real!) where there are no virtio drivers and we don't certify the hypervisor so it requires support exceptions. At this point people have been using VMware for nearly 30 years, there's all kinds of crazy legacy.
VMWare is like Active Directory: been around for ages, wildly flexible, and has a habit of seeping structural debt into every line of business the organization undertakes.
The ecosystem existed - Red Hat OVE, Nutanix were mature in the enterprise market, but VMware had such a grip with low cost (honestly, it was a smart Broadcom play despite infuriating customers), ease of use, feature rich, big software/support/consultant ecosystem. It was almost like the Windows of virtualization, in that it bred complacency.
I know nothing about Tesco, but sometimes ops cultures lack the skills or mandate to successfully switch tech stacks.
At that scale it is almost always easier to run your own infrastructure. Like, I’m not kidding, kubernetes will handle it fairly easy. Get a DevOps engineer or a good consulting agency and run your cluster on Hetzner. This saved us insane amounts of money. No need to buy infrastructure outright but simply moving off the cloud will easily squash your bill by 50% if not more.
My hope from this headline was that some open source solution was functionally equivalent from a business perspective. But then I read that Tesco has had to:
> procure alternative solutions with reduced functionality
meaning VMWare is still basically the only option if you need something that works out of the box. Hopefully this changes in the mid term as other customers migrate away.
The competition is compelling, actually. Red Hat OVE, Nutanix for those who want support, and Proxmox is emerging as a possibility in the ent space.
I read "reduced functionality" as they married themselves to something specific and non-portable, like oh, pick a card from VMware - NSX networking, VSAN storage, maybe something in Tanzu, and that phrase reflected their difficulty escaping the lock-in quickly enough. (This was all speculation)
It's incredible how hard it is for firms to migrate away from platforms. Clearly you could just give something away for nearly free for 20 years and then jack the price up and make bajillions.
Even better if you can charge a mildly high license fee for 20 years first and then jack it up to something outrageous and still have customers who just can't drop you.
i just had to spend a bunch of time (not for work, for hobby purposes) bc broadcom acquired bitnami or something and then decided to kill off the free docker images for various software. very very annoying. can't believe they did this by just yanking the images from the registry too, leaving nodes to fail if they lose their image cache and have to restart
Before AI, the cloud was the big thing. It took years for companies to understand the risk of hosting on someone else’s infrastructure, regardless of the initial cost savings. I’m somewhat happy to see reality sink in, though this specific case is quite alarming.
If AI survives, we’ll see inflated costs drive companies back to hiring actual human beings to do the work.
If anyone here is looking to move Greenplum workloads off Broadcom (or unsupported open source), email me miles.richardson@enterprisedb.com — I’m the PM for WarehousePG [0], an open source fork of Greenplum. We’ve got a cracked engineering team working hard to modernize it.
At EDB we’ve forked Greenplum from last OSS into WarehousePG, added over a dozen customers with petabytes of data, and hired a few dozen specialists. We have an extension for Lakehouse connectivity based on DataFusion (with optional offload to Spark including GPU acceleration) to read/write Iceberg. And we have a lot planned for the next version, which you might infer from the name: WarehousePG 19.
The Broadcom business model (outside the chip business) had been pretty well known, and they don’t really hide it.
They are tech bottom feeders. They find large businesses with a decent moat and free cash flow but are in long term decline (and wasting cash trying to find something new). They buy them, cut development, support and marginal products. Raise prices and squeeze as much as they can.