Hacker News new | ask | show | jobs
by jonpress 4079 days ago
The root of the eventloop issue is not Node-specific. The real underlying issue here is that one CPU core is given too much work while others are more or less idle. Offloading some of the work to another process naturally solves this issue - It doesn't really matter that this other process is a Go program or a Node.js one - Both approaches would have solved the problem. Attributing credit to Go itself for solving the issue is disingenuous.

If you ran Go as a single thread, you would also run into similar issues. The main advantage of Go is that it makes it easier to parallelize your code thanks to goroutines and channels (a single source file can encapsulate the logic of multiple concurrent processes).

That said, I find that this 'ease of concurrency' makes Go code less readable. In Node.js, it's really easy to identify process boundaries since the child_process module forces you to put code into different files and communicate via loosely coupled IPC channels.

Most of the Node.js vs Go arguments are weak. It's surprising that Node.js is still outpacing Go in popularity in spite of all this slander.

6 comments

>Most of the Node.js vs Go arguments are weak. It's surprising that Node.js is still outpacing Go in popularity in spite of all this slander.

Thats a rather defensive position for node in what is a very rare use case for the language. It's unsurprising Node.js is still outpacing Go, given the large number of JS developers and the fact that Go is pretty much worthless for hosting front end web applications (you won't find your favorite asset pipeline in Go).

Its not surprising at all that they switched to a different language for data processing & pipelines, and its still somewhat surprising as that they chose Go, given that most teams in a situation like this would switch to the even more popular JVM/Spark/Storm/Kafka stack.

Finally, your statement that The real underlying issue here is that one CPU core is given too much work while others are more or less idle. isn't accurate - the issue is one thread was has too much work - no modern OS built in the last 20 years would allow a single process to hog all the CPU time unless you explicitly turned off the kernel's scheduling. The root of the eventloop issue, is eventloop specific and its even more Node-specific since the event loop is pretty much the only way to achieve concurrency in node. Other languages (like Go and Java) at least have options for other models of concurrency.

Consider the following - what if both processes got tied up? Do you just start another process? Would it feasible or wise to run 1000 processes (no it wont)? However this is a problem that you won't come across in Go by using goroutines and taking advantage of its scheduler, as you can easily run 1000s of goroutines performantly.

That said - this is a rather narrow use case to make the judgement that one language is better than the other - its just the case that Go is likely better suited for these kind of services.

> the issue is one thread was has too much work

The underlying issue is that when you have a consumer-facing API which accepts HTTP requests with a body, the first thing you should think about is limits.

> Consider the following - what if both processes got tied up? Do you just start another process? Would it feasible or wise to run 1000 processes (no it wont)? However this is a problem that you won't come across in Go by using goroutines and taking advantage of its scheduler, as you can easily run 1000s of goroutines performantly.

I have no experience of go, but my understanding is that goroutines are green threads multiplexed over a small thread pool. If you get 5 MB of JSON in N different requests (N=number of cores) at the same time, I don't see go generating free CPU time out of thin air. The usual way to go about these things in a language without multithreading is to have a queue and a process pool, but this also won't magically solve the issue if all cores are busy.

>If you get 5 MB of JSON in N different requests (N=number of cores) at the same time, I don't see go generating free CPU time out of thin air.

You don't, but the scheduler normally won't allow one thread to completely starve the cpu. Of course, its clear they should be using limits, however JVM, glibc threads scheduler or Go's green threads likely wouldn't allow a single thread to completely starve the CPU, eventually the scheduler will step in and divert resources to another thread.

Without limits in a threaded solution, you would see the latency increase, but you wouldn't see the application stop taking requests altogether.

However there are real benefits for having an event loop concurrency, so this shouldn't be taken as a reason one model is strictly better than another.

Every language requires you to be careful.

C makes you do array out of bounds checks. Javascript makes you worry about tying up your event loop with eg massive string processing.

Just get a friggin asynchronous JSON parser if you're running it on untrusted client input (ie any client input). It's not that hard.

Maybe node should provide a "tainted" feature for modules to mark variables that are "untrusted" and provide some warnings when functions like JSON.parse are run on them.

The upside of JS is massive - easier to reason about control flow than threads, and much easier to build something much FASTER and efficient than threads.

I think good languages require you to care about things that matter for your domain. For example, C's bounds checks are a consequence of demanding fine-grained control.

The problem for me with Node here is that whole cooperative-multitasking thing doesn't directly buy you anything. It's a historical accident, not a necessary downside of an otherwise-positive choice. That's distinct from a browser or a GUI environment, where letting a single thread control the display and events really does buy you things you care about.

I care about single threadedness and evented paradigm to provide guarantees and simplify my reasoning about things. I know that if I call a function, it will return synchronously, but not necessarily with a callback. I know that my objects won't be clobbered by other threads, etc.
> It's surprising that Node.js is still outpacing Go in popularity in spite of all this slander.

From the perspective of a JVM/.NET/C++ developer I find surprising that Node.js got adopted at all in the server space.

why? PHP/ruby/python are successful too in the server space. Javascript,especially ES6, isn't worse than the formers. Devs should know by now that the dumbest tool that is good enough has good chances of being successful today. All these tech won't replace enterprise techs, but enterprise dev is a tiny percentage of all devs outthere.
"...enterprise dev is a tiny percentage of all devs outthere."

I may be terribly confused here, but I'm pretty sure this is the precise opposite of the facts. In terms of jobs, all the OSS languages TOGETHER are not as popular as Java alone, never mind adding in C#/.NET. It's a common misunderstanding, but one that seems unlikely to do a young coder any good...

It's HN bias. Enterprise rarely gets talked about (because by its nature it rarely does cutting edge things) but is still the huge mass of the iceberg under the water.
>enterprise dev is a tiny percentage of all devs outthere

I don't have specifics to quote but enterprise dev is definitely not a "tiny" percentage

Large percentage in cost. Tiny percentage in ROI. Blue ocean for disintermediation.
> PHP/ruby/python are successful too in the server space.

Again, from the perspective of someone that worked in a startup using TCL for server applications, I also don't get it.

Other than being attractive to developers without formal education, and after a certain scale it becomes too costly to re-write.

The work we developed at that company teached me never to use a technology stack without a JIT or AOT compiler for production code.

The amount of money that Facebook has poured into PHP AOT compiler and now JIT is a proof of it, because it is just cheaper to improve the stack than re-write the code.

That's really all about how you structure things. If you're building small, distributed, API-oriented applications -- it's really very easy to re-write your infrastructure. If you're building large monolithic applications, with in-band communication only, you're right, it is pretty tough.

The "formal education" piece is a little rough though. Count me among the developers working with Node.JS and a university degree.

> "...developers without a formal education..."

I think it's more about developers without experience. If you have experience doing something successfully, you are more likely to reach for the same tool set when doing it again. And someone who has done something successfully once is less likely to make naive mistakes when doing again.

I think most issues, like the ones in this article, are much more likely to be attributed to these two facts than to node.js as a language.

Contrived analogy: An experienced builder sets out to build a home. He grabs a Stanley hammer and successfully builds a home. Another person, who has never built anything, grabs a DeWalt hammer and fails miserably. Does this mean that only Stanley hammers are appropriate for building a house?

Seriously? You're saying that the only possible use case for ruby/python/php is "not being a computer scientist"?
Node exists because it can, not because it should.
Also, language-wise, TypeScript seems much more interesting than Go.
> If you ran Go as a single thread, you would also run into similar issues.

Yes, if you hobbled yourself, you would be hobbled.

The inability to do concurrency properly is not a feature, it's a missing feature. The whole point of node was to be a node.

I'm also unsure about this comment. It reads to me as "The main advantage of Go is.. that it makes it really easy to not have this problem." The OP talks about single core being an issue. It is not for Go due to the cooperative goroutine scheduling and function call pre-empting. You can still shoot yourself in the foot, but it's really easy to create an escape hatch by just making a function call or occasionally yielding to the scheduler...
> really easy to create an escape hatch by just making a function call or occasionally yielding to the scheduler...

Having dealt with cooperative multitasking back in the dark ages, I definitely don't believe it is easy. With proper threads, you just write your code in a straightforward manner. With cooperative multitasking, you now have to be continuously imagining performance and sprinkling in otherwise useless calls every time you think something might take a while. When you get that wrong, which will be a fair bit, you have to go back and re-sprinkle. And then when the character of your input changes, you get to re-sprinkle again.

I was ok with it in the dark ages; there wasn't an alternative given the hardware of the time. But now? Even watches are multi-core. I want to use languages that make parallelization easy.

Meh, it really doesn't come up that often. Every method call is an opportunity for the scheduler to run, not just specific ones. I find it is very rare indeed for a significant amount of work to be done without making method calls. Go also uses threads and makes parallel programming easy.. For me anyway.
Ah, I didn't realize that it happened at every method call. That seems more manageable. Thanks.
> The main advantage of Go is that it makes it easier to parallelize your code thanks to goroutines and channels

Well, that and Go is not bounded by a VM (jit or no), and will simply be faster than JavaScript, given the exact same logic. Don't discount the cost of running on a VM, and all of the abstractions which are thrown on top of plain JavaScript to help manage the callback complexity.

Given enough money, a JIT can be made arbitrarily fast, it seems (e.g. - JVM, LLVM bitcode interpreter)

Given time, don't assume any particular language implementation will always be faster. Go might only run faster than Javascript on the odd years, depending on corporate budgets for compiler / VM tuning the previous year.

The JVM has received a lot of money and attention over the years, yet still falls quite short of its direct compiled competitors in almost any benchmark. The threshold for "enough" when it comes to improving a JIT is still arbitrarily high.
> In Node.js, it's really easy to identify process boundaries since the child_process module forces you to put code into different files and communicate via loosely coupled IPC channels.

I find that approach troubling. Process boundaries are expensive, because you have to serialize and everything each time you cross a boundary.

I also haven't used Go, but I think you can get great clarity with something like Akka's Actor model. And to do it you don't have to pay a large serialization tax until you move particular actors to other machines.

Yes, socket/pipe-based IPC is more expensive than shared memory up to a point, but it's more scalable since you don't have to deal with locking (mutexes, semaphores) and the limits this imposes.
There is scaling and then there is scaling.

Sometimes you want to scale up to the box you are on and the most efficient way to do that is to use threads in a single process.

When you hit the limits of a single machine then it's time to start scaling out to multitple machines. You are right that then you have to start paying the communication/serialization costs and in return you get much greater scale.

However you don't have to start paying that cost until scale in that direction. You can get quite a lot out of multicore machine these days without having to pay serialization costs if you use threads.

Many times the serialization costs aren't worth the benefit unless you are getting a whole other box with another 32+ cores out of the deal. Paying it before you get that benefit isn't efficient engineering. And for some people choosing a language or framework that forces them to pay that cost before it's necessary is a bad idea.

If you don't want to deal with locking but are willing to pay extra memory cost, then you can just duplicate the data. A re/de-serialization step is a much more expensive way to do that.
> If you ran Go as a single thread, you would also run into similar issues.

Err no you wouldn't,you don't have to deal with interprocess communication when you deal with Go routines, you share memory between goroutines ie , true concurrency where as in node you'd fork some stuff, serialize data between processes and message with redis to make sure everything is notified something happen. Node isn't concurrent. Go is.

That's why you read all these blog posts about "how I moved from Nodejs to Go(or rust)" and that's why you'll read more of them.