At this point we've done so much optimization of our app that our requests are not CPU or IO bound (those things have been offloaded to backend processes through a Rabbit message queue) and we still get H12 errors and random slowness. At the time of this writing, in the last 10 minutes, we've had 2 H12 errors. It should be zero.
I'm still confused: since you're using NodeJS, I imagine your dynos are effectively handling a large amount of concurrent requests. This should in turn negate any impact of long-running requests, since they don't cause further requests to be queued in any way. So, where are requests (or responses) being queued (or lost) in your app? In Rabbit? Are you getting errors and slowness as streaks rather than isolated random events? Could it be due to spinup time of new backend workers, or something along those lines?
Node handles one request at a time. It isn't multithreaded. It will receive a request, process that request and return a response. If another request comes in at the same time another request is in process, it is queued until the currently processing request is finished. I googled around, here is a good explanation for you. http://howtonode.org/understanding-process-next-tick
The way my application worked is that we had 'long' running things like saving data to a database happening before we return a response to a client (in this case an iphone app). Sometimes mongo, the dyno, networking, phase of the moon, talking to the facebook api, etc... we would get 'slow' processing and it would take a few seconds for a response to make it to the client. As soon as this happens, on a heavily loaded system, the heroku router would get backed up (since it only routes to 2-3 dynos at a time) and would start throwing H12 errors.
So, what we did was rewrite the entire app to do minimal data processing in the web tier, send the response back to the client as quickly as possible. At the same time, we also send a rabbit queue message out with all the instructions in it to process the data 'offline' in a worker task. There is no spinup since these workers are running all the time. We even have several groups of workers depending on the message type so that we can segregate the work across multiple groups of dyno workers. This also allows us to easily scale to more than a 100 dynos to process messages. It works great. Rabbit is a godsend.
I say 'long' and 'slow' above because the longest amount of time we should be taking is a couple seconds at most. Unfortunately, the way that the heroku router is designed is fundamentally broken. As soon as you get a lot of 'slow' requests going to the same dyno's they start to stack up and the router just starts returning H12 errors. It doesn't matter how many dyno's you have because the router only talks to 2-3 dyno's at a time. We get H12's with 50, 100, 200, 300, etc dynos.
We also saw very strange behavior with the dyno's. We use nodetime to log how long things take and we'd see redis/mongo take only a few ms, but we'd have >15s just for the request to complete... somewhere things are slow and we can't figure out where. Until this whole mess came out, Heroku just pointed fingers at everyone else but themselves.
Oh by the way, as soon as you get around 200-300 dyno's deploys start error'ing out as well because heroku can't start up enough dyno's fast enough and that whole process times out too. You can't tell if a deploy worked or didn't. They didn't seem to care about that at all either.
Anyway, I could keep going... but once again, I'll repeat that I'm glad that the Rapgenius guys are calling Heroku out in public on this stuff. There is some big issues here that need to be addressed and the H12/router stuff is the big issue. I'm looking forward to see how they pull out of this one.
> Node handles one request at a time. It isn't multithreaded. It will receive a request, process that request and return a response. If another request comes in at the same time another request is in process, it is queued until the currently processing request is finished.
I'm missing something here. Node does not multithread requests, but it surely can process many requests simultaneously if these requests are waiting for async operations: database, external APIs or other types of I/O usually. That's the very core idea of evented servers.
So, my model is that i.e. if a node process receives 100 requests over the period of 1 second, and each request takes 3 seconds to process but most of that time is spent waiting for async, then the 100 responses will be sent back essentially 3 seconds after they arrived, no queuing to speak of.
From your description, routers do not send multiple requests to the same dyno even if dynos could handle them, and only have a limited amount of dynos they talk to. So queuing is happening in the routers, while dynos idle away waiting for async.
This would be complementary to the problem described by Rapgenius, and mean that the Heroku architecture does not play well with any type of server, neither evented (Node, yours) nor sequential (Rails, as shown by Rapgenius) nor presumably multithreaded or multiprocess (which effectively behaves like evented to the outside world). A huge mess indeed!
> Node does not multithread requests, but it surely can process many requests simultaneously if these requests are waiting for async operations: database, external APIs or other types of I/O usually. That's the very core idea of evented servers.
Within a single request, Node can async its dealing with outside services (databases, api's, etc), but it is still only processing one request at a time. There is no 'synchronized' keyword in javascript. ;-)
There is an interesting header in there: X-Heroku-Dynos-In-Use. From what I understand, this header is the number of dynos that a router is communicating with. For us, this is always around 2-3.
I suspect that the router is just a dumb nginx process sitting in front of my app. It is setup to communicate with 2-3 of my dyno's in a round robin fashion. If any one of those dyno's doesn't process the request fast enough, then requests start to back up. Once requests start to back up past 30s worth of execution, the router starts just killing those queued requests instead of just leaving them in a queue or sending the requests to another set of dyno's. Even worse is if you have a dyno that crashes (nodejs likes to crash at the first sign of an exception). I suspect that is why we see 2 or 3 in that header.
I think that part of the problem is that the routers don't just start talking to more dyno's if you have them available. So, it doesn't matter if you have 50, 200, 500 dyno's because the router is always only talking to a small subset of them. Even if you add more dyno's in the middle of heavy requests, you are still stuck with H12 for the existing dyno's. A full system restart is necessary then.
By 'processing' I mean the Node application has received the request and has not yet sent the response, i.e. the connection is still alive. 'synchronized' has no bearing here. If the request processing is purely CPU-bound with no async operations then only one request will be processed at any time, otherwise Node will happily process up to thousands of requests simultaneously. This is the ideal use case for Node. It should be trivial to log the amount of simultaneous requests being processed.
According to Heroku docs, Cedar routers do not do any queuing and just serve requests immediately to any random dyno. They are pretty clear on this in multiple places, specifically talking about concurrent requests in Node. They also mention a 'routing mesh', which suggests there are many routers doing their thing. But that header you see maybe not be relevant to Cedar, just like the other header 'X-Heroku-Queue-Depth' should not apply to Cedar either.
> Within a single request, Node can async its dealing with outside services (databases, api's, etc), but it is still only processing one request at a time. There is no 'synchronized' keyword in javascript. ;-)
This is as wrong as it can get. Multiple requests are processed "concurrently", in the sense that a request gets served as soon as an existing request awaits on async calls. It is different from thread-based concurrency and thus there is no need for things like "synchronized" keyword.