Hacker News new | ask | show | jobs
by Ramone 4490 days ago
I actually think I understood you, but I'm saying that in that case where statelessness should be an advantage, node.js is actually a much less fault tolerant environment when you compare it to most other web application servers. Most other web app servers offer

(1) request isolation (so most failures in one request can't break other requests) and

(2) a way to catch all exceptions/errors in a single request (and domains don't accomplish this, unless you know what to expect errors from, or wrap everything).

Since node.js doesn't offer those features, it's not even as fault tolerant as PHP was 15 years ago. I'm a huge fan of node.js, but one of the hardest things to do on a large application with a large number of users is to keep an instance of the server from restarting and dropping all the other in-progress requests. If you write your node.js code to be crash-only (like one might do with erlang) your clients are going to have a terrible time.

1 comments

We had the problem you're describing for awhile, but have since figured out how to avoid processes going down and interrupting other reqs. Essentially, you attach a global domain, and when that domain catches an error you stop accepting new connections (obviously you have to be load-balancing between procs) and start a countdown. Some reasonable amount of time later (I think we wait 30 seconds?) you assume that any in-progress request is done and restart the process. We've found this to be very successful.
We do this too, attaching req and res objects to a domain, as well as databases and other network related objects (like smtp clients, etc). This is a huge improvement, but I'm still seeing occasional uncaught error events in our logs on a very large codebase and only in production. Some of them are just ECONNRESET events with no details given, so their origins are REALLY hard to track down. Have you got some magic for catching everything without explicitly having to find all objects that could be emitting? I'd love to hear it if so...
As soon as possible during startup, create a domain and enter it. Because entered domains form a stack, this will be a fallback if an error occurs at a place that isn't covered by any other domains.