| This has all been solved previously. In Google Appengine the scheduler is aware of, for each instance: * the type of instance it is * the amount of memory currently being used * the amount of CPU currently being used * the last request time handled by that instance It also tracks the profile of your application, and applies a scheduling algorithm based on what it has learned. For eg. the url /import may take 170MB and 800ms to run, on average, so it would schedule it with an instance that has more resources available. It does all this prior to the requests running. You can find more docs on it here: https://developers.google.com/appengine/docs/adminconsole/in... For eg. > Each instance has its own queue for incoming requests. App Engine monitors the number of requests waiting in each instance's queue. If App Engine detects that queues for an application are getting too long due to increased load, it automatically creates a new instance of the application to handle that load This is what it looks like from a user point of view: http://i.imgur.com/QFMXeT1.png Heroku essentially need to build all of that. The way it is solved is that the network roundtrips to poll the instances run in parallel to the scheduler. You don't do: * accept request * poll scheduler * poll instance/dyno * serve request * update scheduler * update instance/dyno This all happens asynchronously. At most your data is 10ms out of date. It would also use a very lightweight UDP based protocol and would broadcast (and not round-trip, since you send the data frequently enough with a checksum that a single failure doesn't really matter, at worst it delays a request or two). |
While F5 and similar offer nice hw for that, I'm not sure if their hw (or HAProxy's software) supports the architecture type used by Heroku (many heterogenous workers running wildly different applications with dynamic association of worker to machine etc.)