Hacker News new | ask | show | jobs
by jhspaybar 4867 days ago
I can chime in here that I have had similar experiences at another large scale place :). Some requests would take a second or more to complete with the vast majority finishing in under 100MS. A solution was put in place that added about 5 MS to the average request, but also crushed the long tail(it just doesn't even exist anymore) and everything is hugely more stable and responsive.
2 comments

Let's assume that it is unacceptable to have each dyno tell the router each time it finishes a request ( http://news.ycombinator.com/item?id=5217157 ). And also that the goal is to reduce worst-case latency. And also that we don't know a priori how many requests each dyno should queue before it is 'full' and rejects further requests ( http://news.ycombinator.com/item?id=5216771 ).

Proposal:

1) Once per minute (or less often if you have a zillion dynos), each dyno tells the router the maximum number of requests it had queued at any time over the past minute.

2) Using that information, the router recalculates a threshold once a minute that defines how many queued requests is "too many" (e.g. maybe if you have n dynos, you take the log(n)th-busiest-dyno's load as the threshold -- you want the threshold to only catch the tail).

3) When each request is sent to a dyno, a header fields is added that tells the dyno the current 'too many' threshold.

4) If the receiving dyno has too many, it passes the request back to the router, telling the router that it's busy ( http://news.ycombinator.com/item?id=5217157 ). The 'busy' dyno remembers that the router thinks it is 'busy'. The next time its queue is empty, it tells the router "i'm not busy anymore" (and repeats this message once per minute until it receives another request, at which point it assumes the router 'heard').

5) When a receiving dyno tells the router that it is busy, the router remembers this and stops giving requests to that dyno until the dyno tells it that it is not busy anymore.

I haven't worked on stuff like this myself, do you think that would work?

How was 5 ms added? Multiple sleep states per request?

I imagine the long tail disappears in a similar way that a traffic jam is prevented by lowering the speed limit.

I think you misunderstood: they optimized the long running requests and the optimization incurred 5ms performance loss for short requests. It is not that the additional 5ms solved the problem.