| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Twirrim 2455 days ago

I worked for a major cloud service a few years back, and got a wonderful introduction in to how loadbalancers are, for the most part, somewhat awful.

They work better when it's just one in front of a fleet of servers, and so have the total picture of what is going on, but of course that's quite the bottleneck. So you get two LBs, or more, and they each only have their notion of what the back end fleet is doing. There's no standard feedback mechanism to them at all.

Some offer approaches like measuring response time, but that doesn't work so great as soon as you consider APIs where no two requests perform the same. Was it a fast request that got answered slowly (back-end overloaded?), or a slow request that got answered quickly (back-end bored?). Who knows.

For the service I was working on a few years ago, no two requests are the same by any stretch of the imagination, even for the same API call, and came with a variation on request size, and computational power required to process them.

As you'd expect, traditional loadbalancer behaviour actually handled about things to an okay degree probably 90% of the time. That 10% was a real killer though.

1 comments

stevekemp 2455 days ago

I have similar experience. I once worked with a pretty big site which used a load-balancer configured to route all traffic to the server based on average response-time.

The intention was that if a server was returning results "quickly" that meant it was least-loaded, and could handle the newest requests.

What it actually meant though was that the server disk filled up, and it started returning "500, Internal Server Error" errors. Very quickly.

At the point the alarms were raised almost all incoming traffic had been routed to this dead/dying host.

link

bluedino 2455 days ago

I had almost the same thing happen once - but the server was serving requests 'very quickly' because it was caching everything based on the LB requesting it.

link