Hacker News new | ask | show | jobs
by emilsedgh 4084 days ago
On a server you generally cannot afford to have the event loop blocked by a computational intensive task.

You are not supposed to use your main event loop for computational intensive tasks.

Offload those tasks to separate workers and use queues.

That's node's basic knowledge. Its a trade off that you're supposed to be aware of when using node.

1 comments

The problem with workers is that they don't have an event-loop (like the main thread). So it is not possible to use asynchronous code written for the main thread in those worker threads, which is of course quite limiting.

EDIT: I mean workers which run in a thread (as opposed to in a process). An example is given by the webworker-threads npm module. Threads allow one to structurally share large data-structures, so one does not have to serialize them when calling a worker (serializing large structures would block the main thread).

Sorry you are getting a lot of downvotes. For what it is worth I don't think you deserve them, as your comments just show inexperience and lack of understanding of Node.js, and aren't trolling. However, I think you would be well served by doing some research into what Node.js and and how it works. Basically every Node.js process has an event loop. Your workers have an event loop just like your servers do.

Here is how a typical node stack works:

Nginx load balancer talks to a cluster of node server processes, one per core. The server processes handle all incoming web requests that won't block the event loop. On a typical REST server this is 99% of your tasks, and each node process can handle thousands of concurrent requests due to the way that the event loop works.

If there is a heavy, blocking task like processing an image or PDF file, (although even these things should be able to be done in a nonblocking stream manner) the server processes send a message through a background queue such as RabbitMQ, or Amazon SQS or the like to a background process which has the sole purpose of processing heavy tasks pulled from that queue. Fundamentally if you are using Node.js properly you don't need multiple threads. Instead you use multiple processes, and the processes are essentially "threads" that can talk to each other either using parent/child processes communication, HTTP, redis pubsub, or any other mechanism you want.

But there is no reason why anything should block a Node.js process if it is written properly. I've even done heavy video transcoding in a streaming manner in a Node.js process without blocking the event loop.

The reason for the downvotes, I suspect, is because this looks like an attempt to derail a thread to get tech support on a barely-related topic. Worse, the initial comment was worded as "this thing sucks because..." instead of a question, despite showing very little knowledge about the thing it complained about.
Thanks for the explanation and the moral support :)

I think most people here misread the line "facilitating structural sharing of large data-structures between parallel tasks, which cannot be done using ordinary processes" in my first post.

And by large data-structures, I don't necessarily mean structures which can be "naturally" streamed. I'm thinking more of a large index, for example, which can be used for fast lookup, and be used from several threads at the same time.

Having processes (here named workers) is a nice feature, but doesn't cut it when you want to share large amounts of data between threads (serializing that data would completely block the main thread). In my view, it is unfortunate that the designers of Node.js didn't opt for having multiple threads as opposed to putting every thread in a separate process.

I ended up replying to one of your other comments with more details but the answer to this problem is streams. You can use streams for any and all incoming data, whether it is file data uploaded via multipart upload from a browser, or streaming result set from a database, or raw data streaming out of a storage service like S3 or Dropbox. There is even a streaming JSON parser for Node.js in case you have the ungodly situation of having say a 500 MB JSON file or something horrible like that: http://oboejs.com/
You're describing the worst possible use-case for node, and one that it explicitly is not intended to handle well. If you're doing computationally-intensive tasks on large data sets, use a language that supports that. Node.js is intended for I/O-heavy workloads.
Node workers are just new processes. They do have an event-loop.