Hacker News new | ask | show | jobs
by amelius 4083 days ago
Thanks for the explanation and the moral support :)

I think most people here misread the line "facilitating structural sharing of large data-structures between parallel tasks, which cannot be done using ordinary processes" in my first post.

And by large data-structures, I don't necessarily mean structures which can be "naturally" streamed. I'm thinking more of a large index, for example, which can be used for fast lookup, and be used from several threads at the same time.

Having processes (here named workers) is a nice feature, but doesn't cut it when you want to share large amounts of data between threads (serializing that data would completely block the main thread). In my view, it is unfortunate that the designers of Node.js didn't opt for having multiple threads as opposed to putting every thread in a separate process.

2 comments

I ended up replying to one of your other comments with more details but the answer to this problem is streams. You can use streams for any and all incoming data, whether it is file data uploaded via multipart upload from a browser, or streaming result set from a database, or raw data streaming out of a storage service like S3 or Dropbox. There is even a streaming JSON parser for Node.js in case you have the ungodly situation of having say a 500 MB JSON file or something horrible like that: http://oboejs.com/
You're describing the worst possible use-case for node, and one that it explicitly is not intended to handle well. If you're doing computationally-intensive tasks on large data sets, use a language that supports that. Node.js is intended for I/O-heavy workloads.