Hacker News new | ask | show | jobs
by ryderm 4133 days ago
So you have short timeouts and retries that are load balanced to different nodes. But ideally your services are fast even in their 99 percentiles so this isn't an issue. This is much easier to achieve in a small service than a huge complex one.
1 comments

Um. maybe. 5 machines behind a load balancer. Normal case, load is even, 100 requests to each server. One server starts running into trouble, exceeding timeouts. your load is now ~125 per server, because each client retries frequently. Is 125 enough to push over a "slow" threshold on the others? This will further magnify the load.

The load balancer will spin up more machines, so now you have 10 machines leaning on whatever the back end is.

Yes, your approach is great - but you really have to understand the failure modes - if you're living on the edge, you could have a pretty un fun cascading error.

Thats what circuit breakers are for to not have cascading errors.