Hacker News new | ask | show | jobs
by 411111111111111 1154 days ago
It should happen in the real world as well, at least that's what I've been told when I started my first job as a system admin.

The reason people cited to me back then was that the balancer usually isn't particularly smart when balancing, so they only see a free node, thus every free request is routed to it. The errors (mostly timeout) will happen once the request start to actually get processed.

Normally, the node gets a steady amount of requests over time, thus the load is constant (generally speaking, a request will require the most resources at the same relative time of their lifecycle). As all requests are fresh, they'll all hit the same load bottleneck at the same time, causing all the timeouts.

The answer is to both aggressively scale horizontally and then quickly decommission until you're back to baseline.

Or just accept the failed requests

Its been over 10 years though, it mightve been improved since.

1 comments

I don't know anything about this subject, but my first thought (which may be wrong) would be to just set the weight of the new server to be the same as one of the other servers that are receiving messages (perhaps one of the lower ranks). In that way, it would not be overloaded so easily and adjust its ranking after a while
I guess my explanation was lacking then, as that wouldn't help. reducing the weight below the old nodes might work, but it would also extend the duration you're overloaded, which would also cause requests to fail.
That makes sense. I guess there's no simple fix for it.