Hacker News new | ask | show | jobs
by whoknew1122 1299 days ago
It may be a risk borne by every cloud provider, but why does this only really happen to Microsoft among large providers?

As far as chip shortages, it probably helps that Amazon makes its own chips. Microsoft could do the same rather than running out of capacity and blaming chip shortages.

Microsoft had to know that at some point they were going to run out of capacity. They should've either did something about it or let customers know.

4 comments

There's all sorts of examples of AWS failing to be able to provide capacity too. Just do a search for "aws InsufficientInstanceCapacity" or similar. I remember Fortnite talking about capacity limits in relation to an incident, but I'm struggling to find the post-mortem I saw it in.

Even when Microsoft was being open about Azure having difficulty getting Intel chips, AWS, GCP etc. were in the same position and just not really talking about it. From my time in AWS there were some other times when some services with specialised hardware came really, really close to running out of capacity and had to scramble around with major internal "fire drills" against services to recoup capacity.

Most people won't run in to these issues, the clouds all tend to be good at it, but they still happen.

There are also advantages of the economy of scale and brand recognition. The more customers you have the more the capacity trends smooth out, the easier it is to predict need, even if you're still stuck with uncertainty on the ordering side.

It’s certainly true I run into these things with AWS as well, but it’s generally limited to a specific instance type/availability zone combination. I’ve never had all instance types unavailable.

If anything, I’m surprised we can just spin up a few hundred instances out of nowhere and not run into capacity issues.

AWS has capacity issues you can generally mitigate. Azure however will just lock you out of a solution completely and tell you to switch regions as if that was some trivial thing.
They have a lot of technical debt. They have like 6 different clouds (at least 4 gov clouds alone) and SLA commitments to things like O365 that silo their infrastructure.

MS also makes all sorts of crazy deals and commitments, and I wouldn’t be surprised if being collocated with a strategic customer may lead to local shortages of resources.

AWS has at least 3 publicly-discussed 'clouds' (or partitions, as they're called at AWS). There may or may not be other partitions that cannot be discussed publicly.
There’s a pretty clean demarc between the AWS clouds. With Microsoft because they have O365 and Azure AD dependencies sprinkled everywhere with varying features it’s a real mess. So you can do government contract with with device managed by Windows Autopilot & Intune in a commercial cloud, have email in a Gov Community Cloud, and deliver apps in a US Gov cloud, all with different identities etc.
> As far as chip shortages, it probably helps that Amazon makes its own chips.

IDK what chips you are talking about, all x86 (which I assume is most of their compute) is Intel or AMD. If they make their own they are only making the ARM instances.

AWS has three processors: Graviton, Inferentia, and Trainium. They're made in-house.

https://aws.amazon.com/silicon-innovation/

And none of the above are x86. Even if they're making their own silicon, it is for specialized use (ML) and not general server provisioning.
Amazon's own chips are ARM. ARM requires somewhat specialized builds of software that are likely different than development instances, CI/CD, and/or local dev machines. It's not insurmountable but does certainly complicate usage.
Your local dev machines might be Macs though, in which case it might be easier for you to go with ARM servers than x86.
They might be. My local dev machine is a Mac. I've found Intel or Intel+ARM container images; never an ARM only. Again, not insurmountable but certainly more resistance than the straight intel route.