Hacker News new | ask | show | jobs
by baudehlo 5147 days ago
We know the java libraries are there for lots of things, but Node is all about async and handling thousands of connections. It does this by forcing the entire ecosystem to be async too (including things like database drivers).

By using a Java JDBC database driver you're completely losing any async support. Same presumably goes for redis or Mongo drivers. You can do some of the work with threads and pooling, but it's still not the same, and makes this another useless micro benchmark.

7 comments

The value of this is far overstated. You can think of your server as being composed of async queues and thread pools, even in node. About the only thing in a stack that truly is async is connection handling via epoll. Mysql itself is a big threadpool, bounded by the number of cores and table locking.

The jvm has amazing threading support, doubly so if you use it with a language like scala or clojure. You can and should handle the connections asynchronously and use a thread pool for things like db access. It works well, people have done this with the jvm for years.

Node's API is async. Under the hood everything is done via threadpools, same as in Java or any other stack. Your hardware knows how to run threads; that's all. Whether or not that's what's exposed to you as a programmer is a different story.

You don't actually get a performance boost from Node being "async". Node's async abilities simply give you transparent access to threads that are otherwise unavailable with javascript, and it's the threads giving you performance.

I don't think this is true. Nodejs uses epoll/kqueue/select etc to multiplex access to multiple file descriptors from a single main thread.

The async API is actually a price to pay (spaghettification).

For example, the go language took a different approach: it created a cheap thread-like construct which doesn't incur in the biggest overhead of classical threading (namely pre-allocated linear stacks and context switching/sleeping requiring a systemcall; all this aided by the compiler), and a cheap mean of communication (channels).

Then, the whole core IO library was written using a multiplexing async model (epoll...), which communicates with the user part of the library via channels). The result is a blocking like API which under the hood behaves like an async implementation.

A similar goal is also met by http://www.neilmix.com/narrativejs/doc/ and other javascript 'weavers' which convert "sequantial looking" code into callbacks.

Yes, but at the end of the day, underneath it all, you gotta have threads because that's what the hardware understands. Even if you use hardware interrupts to detect IO, you still need a plain old thread to handle it. The only difference between various languages and runtimes is how you distribute tasks among the threads. Some environments provide green threads that have a partial stack, but even they are handed off to a thread pool (or a single thread) for execution.

It's been found that if you employ only a single thread (that can run any number of tasks) you get a performance boost over using a larger threadpool under some conditions, but a single thread wouldn't let you scale by taking advantage of all processors.

I feel that the cause of misunderstanding lies in the fact that "thread" in this context is usually means "thread based IO", which means that when a thread issues a IO request it remains blocked until the IO request returns, leaving CPU time to other threads. All this regardless how many processors you have; it works perfectly fine with single processors.

Async IO is different, it's a different patter of access to IO and as such it's orthogonal to any threading or multiprocessing that's going on in order to actually do stuff in response to that IO.

> It's been found that if you employ only a single thread (that can run any number of tasks) you get a performance boost over using a larger threadpool under some conditions, but a single thread wouldn't let you scale by taking advantage of all processors.

Indeed. Nodejs solution to this problem is to have a cluster of nodejs processes and a dispatcher process on top. So multiprocessing is done the "old way".

In that case, Java gives you both options: blocking IO and non-blocking, polling IO. Netty can use both, but most people use it with the non-blocking option. Experiments have shown that sometimes one is faster and sometimes the other.
Umm, hardware knows NOTHING about threads. Threads give you a very fake view of the hardware. Everything about threads is an emulation over the hardware layer, hence why they have a large memory overhead.
The CPU is aware of a thread's instruction pointer and stack pointer (that's how some CPUs are able to support hyperthreading). Perhaps it's possible that the OS could somehow manipulate that to implement threads that are not as heavyweight as "common" threads, but I'm not aware of any OS that does that. Threads are the only multiprocessing abstraction provided by the CPU and the OS (although now there are some new abstractions for GPUs).
Vert.x has a hybrid model.

It has both event loops and a background thread pool, so you can choose which to run your task on depending on what kind of thing it is.

E.g. it's stupid to run long running or blocking actions on an event loop.

> forcing the entire ecosystem to be async too

You say that like it's a good thing. I'd much rather have the choice between async and sync.

The problem comes when you mix the two. Try it. Try scaling it (to a hundred thousand connections). I've done it and it doesn't mix.

Sure programming-wise it's nice to have sync. No argument there.

> It does this by forcing the entire ecosystem to be async too (including things like database drivers).

Yes, that's exactly Node's main selling proposition everybody forgets when presenting their next Node.js

I would call it "node.js's largest implementation issue". It is not that JavaScript gives you another choice, while you make it sound like it was a principled decision.

Other platforms/languages have real concurrency constructs and don't suffer node's limitations.

Well, no, you could have done all of these things synchronously, and in fact JS would have preferred it because JS is intrinsically single-threaded. Ryan Dahl's stated inspiration for Node was that he struggled with a certain slowness in Ruby because it blocked for everything, so he tried to build an entire language that simply wouldn't let you sleep(). You can go listen to his talks; they're on YouTube. It was a principled decision.

I don't know whether making JS single-threaded was a principled decision -- if anything it was presumably the KISS principle at work. However, it was actually a ridiculously nice choice to offer a single-threaded-asynchrony model. It sometimes gets in the way rather obtusely -- Firefox can still (if very rarely) fail to introspect and then crash when some ad script on your page goes into an infinite loop! -- but on the whole, it is very nice to always know that while I'm in this function, modifying this variable, nobody else can interfere.

With that said, I also think that the lack of good concurrency planning is indeed missing, and that it will probably enter the language at a future time.

Actually that's not a "selling proposition".

At best it's "making a virtue out of necessity".

>We know the java libraries are there for lots of things, but Node is all about async and handling thousands of connections. It does this by forcing the entire ecosystem to be async too (including things like database drivers). By using a Java JDBC database driver you're completely losing any async support. Same presumably goes for redis or Mongo drivers. You can do some of the work with threads and pooling, but it's still not the same, and makes this another useless micro benchmark.

Sounds like you're quoting from the "Node.js Is Bad Ass Rock Star Tech" satire video ( http://www.youtube.com/watch?v=bzkRVzciAZg ).

Nothing inherently special about "forcing the entire ecosystem to be async too", especially since Node is more or less FORCED to do that, because javascript is single threaded.

Add the bad callback spaghetti implementation of async, and the main benefit of Node is easy deployment, and accessibility to the millions of javascript programmers.

As an async environment it doesn't offer anything either new or too compelling.

> As an async environment it doesn't offer anything either new or too compelling

That isn't strictly true.

If you focus on the traditional scripting languages as your competition: Ruby, Python, PHP, Perl: Then you start to realise that Node.js does offer a similar language structure (dynamic, no compilation, etc), with the benefits of thousands of concurrent connections (which those languages will do with certain modules) but while forcing all the libraries to also be async (which those languages DO NOT do).

At my last job I had to build an SMTP server capable of scaling to 50k concurrent connections. Building this in Perl was fine, except for any library I wanted to use - all of the libraries were synchronous. So now I wrote Haraka, which Craigslist are now using as their incoming SMTP server.

If you compare all that to Java you get slightly less performance but probably lower memory requirements. And that's OK. Different strokes for different folks.

No, not really. You can happily run blocking and non-blocking code on the JVM without problems. The inability of JavaScript to do that created the need make everything asynchronous, not the other way around.