Hacker News new | ask | show | jobs
by ocdnix 2154 days ago
Turns out this is about EC2 spot instances for ECS. How would it compare to ECS Fargate spot these days?

I'm also missing a discussion about designing for interruption, either by not keeping state, or by being able to shed state quickly, to be picked up by other instances.

Also, if you set up EC2 spot with a launch template or ASG with very differently-sized instance types (to reduce risk of running out), is there a way to even out the load coming through an ALB? The least-connections scheduling can help in some cases, but a connection might not map 1:1 to one unit of load. The ALB can use weighted balancing, but on the target group level. Dunno how easy it would be to allocate different instance sizes to different target groups and weigh them accordingly.

2 comments

AFAIK with Fargate a lot of this is handled for you, as long as you have the auto scaling group.

We have this setup with two capacity providers (FARGATE_SPOT and FARGATE) with a 75/25% split, meaning that even if there are no spot instances available we will still be up.

The benefit of Fargate being that we don't need to care if certain instance sizes are not available as that is handled by AWS.

Cool, when fargate launched they didn't have a spot possibility (AFAIK) and since we run ECS on Spot instances it would just be a massive increase in cost to switch to FG, but if it now can use underlying spot instances, it might be worth looking at again..
Yeah spot capacity providers for Fargate only got added a few months ago, been running well for us in production.
(Not the OP, but running a fairly similar setup, albeit for EKS nodes rather than ECS nodes :) )

Fargate Spot is about a third of the price of Fargate (at least in eu-west-1 now according to: https://aws.amazon.com/fargate/pricing/ ); so the savings are roughly identical.

Re risk of running out, our current strategy is to use different-but-closely-similar instance groups; so for example we have an autoscaling group running a mix of:

- m5.large - m5dn.large - m5n.large - m5ad.large - m5d.large

Which are the same price on Spot instances, but I'd wager it'd be pretty rare to have all these families reclaimed at once.

(We also use some on-demand only ASGs with lower priority in the cluster-autoscaler to ensure that if it _does_ happen, then we'll have a fallback)