Hacker News new | ask | show | jobs
by Sanguinaire 1715 days ago
It's weird how the 3 major clouds have taken different paths to what must be an almost identical resource allocation problem.

AWS has had this kind of spot instance for years, but with a 2 minute grace period rather than the 30 seconds GCP is offering. Azure and GCP both originally went with the 24-hour cutoff (which can easily be replicated on a regular spot instance if needed), but now GCP are backing off on that restriction.

2 comments

I don't remember Azure having the 24H cutoff. My spots/low priorities used to run for weeks

I used LP/spot both as scaling sets and the more recent single VMs

Also i wouldn't call the cutoff and grace period as "paths". There were much more substantial differences between the different clouds

I seem to remember the Azure 24 hour limit being in place in 2018, though it wasn't highlighted particularly well and came as a surprise to me. Could well have been removed since.
I suspect the 24h limit is to prevent angry calls from customers who buy spot instances because they're cheaper, then wonder why their VM randomly went down and blame it on Azure.

(I work for Azure but don't know anything about the policies)

They probably thought 24 hours limitation can store some design wins; their customers may have proved them wrong.
There are different win for both the approach. I train models on spot VMs which could take more than 24 hours. I set the price to be higher than reserved and there is very slim chance that training could be stopped and I get 80% saving on average. I don't want to spend time writing the complex logic to resume training after spot instance dies.

For services though, GCP pre-emptible instances are perfect combo for kubernetes.