Hacker News new | ask | show | jobs
by nikcub 4867 days ago
This has all been solved previously. In Google Appengine the scheduler is aware of, for each instance:

* the type of instance it is

* the amount of memory currently being used

* the amount of CPU currently being used

* the last request time handled by that instance

It also tracks the profile of your application, and applies a scheduling algorithm based on what it has learned. For eg. the url /import may take 170MB and 800ms to run, on average, so it would schedule it with an instance that has more resources available.

It does all this prior to the requests running.

You can find more docs on it here:

https://developers.google.com/appengine/docs/adminconsole/in...

For eg.

> Each instance has its own queue for incoming requests. App Engine monitors the number of requests waiting in each instance's queue. If App Engine detects that queues for an application are getting too long due to increased load, it automatically creates a new instance of the application to handle that load

This is what it looks like from a user point of view:

http://i.imgur.com/QFMXeT1.png

Heroku essentially need to build all of that. The way it is solved is that the network roundtrips to poll the instances run in parallel to the scheduler. You don't do:

* accept request

* poll scheduler

* poll instance/dyno

* serve request

* update scheduler

* update instance/dyno

This all happens asynchronously. At most your data is 10ms out of date. It would also use a very lightweight UDP based protocol and would broadcast (and not round-trip, since you send the data frequently enough with a checksum that a single failure doesn't really matter, at worst it delays a request or two).

3 comments

A big problem is that the newer stack is not homogenous - the applications deployed on Dyno have much, much bigger variability than the old "Rails/Rack only" stack of Heroku. Meanwhile GAE stack is fully controlled by Google and reuses, afaik, their impressive "google scale" toolchest that goes from replacement naming systems, through monitoring, IPC, custom load balancing etc.

While F5 and similar offer nice hw for that, I'm not sure if their hw (or HAProxy's software) supports the architecture type used by Heroku (many heterogenous workers running wildly different applications with dynamic association of worker to machine etc.)

> It also tracks the profile of your application, and applies a scheduling algorithm based on what it has learned. For eg. the url /import may take 170MB and 800ms to run, on average, so it would schedule it with an instance that has more resources available.

That is very awesome technology, but it something like that available for non-google people?

Expensive commercial appliances like the popular f5 big ip's can, and that is what a lot of large-scale websites use:

http://www.f5.com/glossary/load-balancer/

In terms of open source, HAProxy has layer 7 algorithms but they are much simpler:

http://cbonte.github.com/haproxy-dconv/configuration-1.5.htm...

If you were inclined, you could write an algorithm to implement something similar in one of the open source routers.

Sounds nice, but I'm not sure it's the only way -- that Heroku 'essentially needs to build all of that'. It'd be interesting to see whose routing-to-instance is faster in the non-contended case, between Heroku and GAE. Do you know of any benchmarks?
Don't know of any benchmarks, but I have/had a number of projects on AppEngine and it is very good (but expensive). I would be looking to include Elastic Beanstalk in a comparison as well, as it is gaining popularity since it launched (it doesn't have the lockin and supports any environment).