Hacker News new | ask | show | jobs
by gingerlime 2724 days ago
I played around with fargate and one of the things I couldn’t work out is scaling quickly (or quick-enough). I think the problem wasn’t purely fargate but actually the load balancer. Even though containers were launched and responsive, the load balancer needed something like 3 liveness responses to bring it into the pool, and the time between each probe was something like 30 or 10 seconds and not very flexible (sorry, my memory is fuzzy)... so this felt like it only really fits loads that aren’t very spikey and also the potential saving from scaling down is somewhat reduced.

Did anyone experience something similar? Or maybe I did something wrong?

If one of the benefits of firecracker is quick spin up time, then this only works if the load balancer also responds quickly doesn’t it?

Granted, it was a while ago so things might have changed.

2 comments

Fargate definitely takes some time to figure out. It took a while for us to realize that we needed to bump up our instance sizes because the default instance was a t2.micro.

However, now that we got it configured properly (took about 6 hours over the span of 3 days to catch the issues), we flawlessly serve 11M API requests/day without a problem. We were running these on DO boxes, moved it over to elastic beanstalk which caused more problems than it was worth, and finally landed on Fargate.

Tried EKS, but it was a bit more cumbersome than we would have liked for a K8s service. (We run another product of similar scale on K8s via GKE).

If you're looking for something closer to Heroku than K8s, then Fargate is decent option.

That's configurable - https://docs.aws.amazon.com/elasticloadbalancing/latest/appl... - default `HealthyThresholdCount` is 5 and `HealthCheckIntervalSeconds` is 30 seconds.

We adjust those down - somewhere 10-15 seconds for HealthCheckIntervalSeconds and 3 for HealthyThresholdCount works pretty well.

The fastest you can go on the Application Load Balancer is a health check every 5 seconds, with 2 successes being enough to put the machine in service, which means a minimum 10 second lag.

The Network Load Balancer is technically more scalable (able to accept more connections per second from the outside), but has a longer minimum inclusion time - 2 checks at a 10 second interval, so 20 seconds.

So yeah, you want slightly beefier containers, if you're scaling up and down heavily. But all this is pretty moot - whatever autoscaling parameters you set reaction time of CPU / RAM usage analysis is still going to be minutes. It seems like this is okay for now.

If you really want super fast scaling, use a Go function on Lambda (outside a VPC). With Firecracker improvements the cold start time should be barely noticeable, and you'll ramp up pretty quickly.

That’s still at least 30 seconds (+boot time for the container). So felt too slow for some use cases in my opinion
You get into some fundamental signal-processing type issues in terms of how quickly you respond to increases in a given value (incoming requests in this case, but it's a general issue) vs. (in this case) spinning up too many things and overcharging the customer. There's a limit to how reactive Amazon can be here, even in theory. You may have to do some pre-sizing if your needs are that great, and choose to take a possible over-provisioning hit vs. a possible underprovisioning hit. I think it's pretty obvious why Amazon would choose to bias in the underprovisioning direction in this case.

(There's some really good stuff in the signal processing field for anyone responsible for high-scale systems. An underrated branch of math for computer programmers. Believe it or not, the "fundamental limits" I'm referring to are the same ones involved in the Heisenburg Uncertainty Principle, when you get down into it.)