Hacker News new | ask | show | jobs
by ericmj 4842 days ago
What is wrong with the random routing they are doing now? Random routing works if your dynos can handle concurrent requests.
2 comments

Random routing seems okay if you think only about busy servers under light load. However, one of the problems is it creates light servers when under heavy over-all load. Any time a server is idle during a high overall demand period is work opportunity that is irreversibly lost. I made this argument in the single-thread 2-server case here http://www.win-vector.com/blog/2013/02/randomized-algorithms... , but he principle applies much more generally.
It requires a much larger number of dynos to achieve similar performance as the same setup with intelligent routing.
why? do you have any proof/calculations? what you're stating seems unintuitive to me. would't you expect intelligent routing and random routing to converge in the limit of many concurrent processes?
The issue is that while the average time is going to converge - users aren't interested in the average time!

Consider the following scenario:

  - Two types of requests, A and B
  - Request A takes 200ms to process
  - Request B takes 10 seconds to process
  - Each dynamo can take 4 concurrent requests
If you are receiving 1000s of requests per minute, it is very likely that you will eventually allocate more than 4 request B to a single dynamo. One this has happened, that dynamo is now locked for 10 seconds. All of the request As that we route to that dynamo will take over 10 seconds to return.

If request A is a credit card transaction, and request B is just some data lookup with a loading UI spinner, then every time this occurs our poor app has lost money as the user has navigated away from our app before the transaction (request A) can complete! Ouch!

Intelligent routing solves this by ensuring that request A will only go to a dynamo that is open, thereby ensuring no user will have to wait 10 seconds for their quick credit card transaction.

The take away here is that the important measure is the slowest user facing request, not the average request time across all requests.

How about factoring your A-type requests and B-type requests into two separate apps?

Oh, wait, Heroku already tells you to do this: the B-type requests are why they introduced "workers." Heroku's recommendation has always been that you're supposed to write your "app servers" to serve A-type requests directly, while asynchronously queuing your B-type requests to be consumed by workers. You then either poll the app server for the completion state of the long requests, or have the worker queue a return-value paired to the request-ID, that the app server can dequeue and return along with the next request. (Erlang's process-inbox-based message-passing semantics, basically.)

To put it another way, it's the old adage of "don't do long calculations on the UI thread." In this case we have a Service-Oriented Architecture, so we've got a UI service--but we still don't want it to block. By default, Heroku basically exposes Unix platform semantics; unless you wrap those with Node or Erlang, you have to deal with how Unix does SOA: multiple daemons, and passing messages over sockets. Heroku could "intelligent-route" all they want, but there's no level of magic that can overcome applications designed in ignorance of how concurrent SOA architecture works on the platform you're designing for.

However, the more concurrency your stack can do, the less this matters. Assuming the probability of a node getting assigned a long request is 1/5, the probability of a node getting assigned 4 of them is (1/5)^4 = 1/625. If your stack can do 20 concurrent requests, it's (1/5)^20 = 1/95367431640625.[1]

Note that this overlooks the fact that a node that has requests queued only drains them at a certain rate, as long requests tend to pile up. However, if your framework is truly concurrent and your nodes are not CPU bound, but rather waiting for other services, it's possible to get much higher concurrency than just 20.

[1] http://www.wolframalpha.com/input/?i=%281%2F5%29%5E20

Yes, the "just buy more dynos even when better routing would let you get by with less" strategy. I can see the appeal in this, for Heroku.
You misunderstood my reply. It means that each dyno does more work by being concurrent, not buying more dynos.

It's possible to do several hundred requests per second on a single dyno; just because Rails doesn't allow you do to that due to not being concurrent doesn't mean that other stacks can't.

This guy analyzed the problem in detail: http://rapgenius.com/James-somers-herokus-ugly-secret-lyrics
In the limit of many concurrent processes, yes, this does converge. However, not even concurrent servers have truly independent response times: unbalanced load will slow down a server. Balanced routing is almost always more optimal, if you can pay the cost of maintaining that state. The problem is only aggravated by servers which can only do one thing at a time.

do you have any proof/calculations?

Sure. http://aphyr.com/posts/278-timelike-2-everything-fails-all-t...

There was a fairly persuasive simulation posted on HN awhile ago that shows that random is the worst load balancing algorithm.

There has to be some happy medium between "embarrassingly parallel but shitty performance" and "intelligent routing that can't scale".

“Intelligent routing” is round robin and requires knowledge of the number of request handlers. It's not intelligent at all, it just means that you are ruling out high concurrency servers.

If you would extend the whole thing to have a negotiation protocol that lets the load balancer know how loaded the individual backend processes are then you could start talking about intelligent routing. That however currently does not exist in the Ruby world.