| I doubt they want every inbound request to require: • query remote redis for lowest-connection-count dyno(s) (from among potentially hundreds): 1 network roundtrip • increment count at remote redis for chosen dyno: 1 network roundtrip (maybe can be coalesced with above?) • when connection ends, decrement count at remote redis for chosen dyno: 1 network roundtrip That's 2-3 extra roundtrips each inbound request, and new potential failure modes and bottlenecks around the redis instance(s). And the redis instance(s) might need retuning as operations scale and more state is needed. Random routing lets a single loosely-consistent (perhaps distributed) table of 'up' dynos, with no other counter state, drive an arbitrarily large plant of simple, low-state routers. |
* the type of instance it is
* the amount of memory currently being used
* the amount of CPU currently being used
* the last request time handled by that instance
It also tracks the profile of your application, and applies a scheduling algorithm based on what it has learned. For eg. the url /import may take 170MB and 800ms to run, on average, so it would schedule it with an instance that has more resources available.
It does all this prior to the requests running.
You can find more docs on it here:
https://developers.google.com/appengine/docs/adminconsole/in...
For eg.
> Each instance has its own queue for incoming requests. App Engine monitors the number of requests waiting in each instance's queue. If App Engine detects that queues for an application are getting too long due to increased load, it automatically creates a new instance of the application to handle that load
This is what it looks like from a user point of view:
http://i.imgur.com/QFMXeT1.png
Heroku essentially need to build all of that. The way it is solved is that the network roundtrips to poll the instances run in parallel to the scheduler. You don't do:
* accept request
* poll scheduler
* poll instance/dyno
* serve request
* update scheduler
* update instance/dyno
This all happens asynchronously. At most your data is 10ms out of date. It would also use a very lightweight UDP based protocol and would broadcast (and not round-trip, since you send the data frequently enough with a checksum that a single failure doesn't really matter, at worst it delays a request or two).