Hacker News new | ask | show | jobs
by TomFrost 4083 days ago
1. Node.js is (for the most part) single-threaded; that's its draw. It's not trying to be a swiss army knife, and if your use case requires a threaded language then Node.js certainly isn't the tool for that job. But it might find a useful place in your toolbox for other tasks.

2. require() is part of the CommonJS spec, and how it physically works is dependent on the implementation. You point out that Node's implementation doesn't work well in the browser, but Node itself does not work in the browser so that point is moot. I agree that it might be interesting to load remote modules in Node, but keeping that operation synchronous does simplify the language quite a bit.

3. You can also map modules to public or private git repositories in the package.json, as long as the private key used during npm install has access. If the git repo has tags, a tag can be specified in the git uri as well. Private npm repos are the superior way to distribute private modules with wider access, but I think this is handled fairly cleanly already.

1 comments

Thanks for your comments. Some remarks here.

1. Node.js is a tool for building servers. On a server you generally cannot afford to have the event loop blocked by a computational intensive task. You need threads.

2. It would only require a "promise" to make the module-loading asynchronous. Leaving that out is not what I would call "quite a bit of a simplification", especially if using asynchronous callbacks is the "modus operandi" of programming on the Node.js platform itself.

3. Okay, I stand corrected. I remember that I waded through the documentation quite a bit though, trying to figure this out.

On a server you generally cannot afford to have the event loop blocked by a computational intensive task.

You are not supposed to use your main event loop for computational intensive tasks.

Offload those tasks to separate workers and use queues.

That's node's basic knowledge. Its a trade off that you're supposed to be aware of when using node.

The problem with workers is that they don't have an event-loop (like the main thread). So it is not possible to use asynchronous code written for the main thread in those worker threads, which is of course quite limiting.

EDIT: I mean workers which run in a thread (as opposed to in a process). An example is given by the webworker-threads npm module. Threads allow one to structurally share large data-structures, so one does not have to serialize them when calling a worker (serializing large structures would block the main thread).

Sorry you are getting a lot of downvotes. For what it is worth I don't think you deserve them, as your comments just show inexperience and lack of understanding of Node.js, and aren't trolling. However, I think you would be well served by doing some research into what Node.js and and how it works. Basically every Node.js process has an event loop. Your workers have an event loop just like your servers do.

Here is how a typical node stack works:

Nginx load balancer talks to a cluster of node server processes, one per core. The server processes handle all incoming web requests that won't block the event loop. On a typical REST server this is 99% of your tasks, and each node process can handle thousands of concurrent requests due to the way that the event loop works.

If there is a heavy, blocking task like processing an image or PDF file, (although even these things should be able to be done in a nonblocking stream manner) the server processes send a message through a background queue such as RabbitMQ, or Amazon SQS or the like to a background process which has the sole purpose of processing heavy tasks pulled from that queue. Fundamentally if you are using Node.js properly you don't need multiple threads. Instead you use multiple processes, and the processes are essentially "threads" that can talk to each other either using parent/child processes communication, HTTP, redis pubsub, or any other mechanism you want.

But there is no reason why anything should block a Node.js process if it is written properly. I've even done heavy video transcoding in a streaming manner in a Node.js process without blocking the event loop.

The reason for the downvotes, I suspect, is because this looks like an attempt to derail a thread to get tech support on a barely-related topic. Worse, the initial comment was worded as "this thing sucks because..." instead of a question, despite showing very little knowledge about the thing it complained about.
Thanks for the explanation and the moral support :)

I think most people here misread the line "facilitating structural sharing of large data-structures between parallel tasks, which cannot be done using ordinary processes" in my first post.

And by large data-structures, I don't necessarily mean structures which can be "naturally" streamed. I'm thinking more of a large index, for example, which can be used for fast lookup, and be used from several threads at the same time.

Having processes (here named workers) is a nice feature, but doesn't cut it when you want to share large amounts of data between threads (serializing that data would completely block the main thread). In my view, it is unfortunate that the designers of Node.js didn't opt for having multiple threads as opposed to putting every thread in a separate process.

I ended up replying to one of your other comments with more details but the answer to this problem is streams. You can use streams for any and all incoming data, whether it is file data uploaded via multipart upload from a browser, or streaming result set from a database, or raw data streaming out of a storage service like S3 or Dropbox. There is even a streaming JSON parser for Node.js in case you have the ungodly situation of having say a 500 MB JSON file or something horrible like that: http://oboejs.com/
You're describing the worst possible use-case for node, and one that it explicitly is not intended to handle well. If you're doing computationally-intensive tasks on large data sets, use a language that supports that. Node.js is intended for I/O-heavy workloads.
Node workers are just new processes. They do have an event-loop.
@1. You don't need threads if workers are enough for you, that's how you should do computation intensive tasks...
Computation intensive tasks often take large amounts of data as input. And sharing data with a worker always has to be done by serializing this data (in a message). So for large inputs, this approach doesn't work (the main thread would block the cpu while serializing the messages).

But my biggest problem with workers is that they don't have an event-loop, so I can't share asynchronous code between the main thread and the workers.

There is no need to serialize large amounts of data. The way it is designed to work in Node is you use a stream. So for example lets say you have a multi TB data dump in Amazon S3, you want to process it, and then upload a transformed multi TB result set back to Amazon S3. (This is something I've worked on before).

The way it works is you open a download stream from S3, pipe it into a Node.js transform stream, and then pipe that stream into an upload stream that uploads the data back to S3 using the multipart upload API.

The Node.js design is very much like using Unix pipes. You can pipe a huge multi TB file through grep without blocking anything. The data just streams from disk into the grep process, grep filters it down to things that match, and then streams the results onto the screen.

Computation on huge streams in Node.js works the same way. Your event loop remains unblocked even when operating on a stream TB's in size because you are only ever touching a portion of the dataset at a time. Additionally if you do it properly your overall memory usage remains low as you are exporting the data back out of the machine as fast as it comes in. I've used this technique to process streaming data many GB in size while keeping the node process under 200 MB of memory used from the system perspective.

Recommended reading: https://nodejs.org/api/stream.html

Here is an example of an upload stream that I created for the use case of processing a large multi TB data set and piping the result up to Amazon S3: https://www.npmjs.com/package/s3-upload-stream

For streaming, I can see that this can work.

But basically, what I wanted to do, is implement a module that works as an index between threads (e.g., a search-tree for fast lookup). However, since in Node.js all threads are in a separate process, it is (afaict) impossible to make this efficient, as processes do not share data.

So in Node.js this would be accomplished by using a shared data store like Redis. For example I run eight processes per c3.xlarge instance, and the instances share a Redis which contains data like that. Particularly indexes could be stored in the Redis hash structure.

Basically Node.js is designed around the concept of microservices and separation of concerns. Rather than doing everything in one giant, multithreaded monolithic process you break your service up into loosely coupled components that talk to each other via messaging and share common datastores. Some people really like this pattern (I'm a strong advocate of it myself) because it scales really, really well.

It sounds like nodejs is not what you want then.