| Don't believe me? Try it: Node.js: https://gist.github.com/3200829 Clojure: https://gist.github.com/3200862 Note that I picked the really small messages here--integers, to give node the best possible serialization advantage. $ time node cluster.js
Finished with 10000000
real 3m30.652s
user 3m17.180s
sys 1m16.113s
Note the high sys time: that's IPC. Node also uses only 75% of each core. Why? $ pidstat -w | grep node
11:47:47 AM 25258 48.22 2.11 node
11:47:47 AM 25260 48.34 1.99 node
96 context switches per second.Compare that to a multithreaded Clojure program which uses a LinkedTransferQueue--which eats 97% of each core easily. Note that the times here include ~3 seconds of compilation and jvm startup. $ time lein2 run queue
10000000
"Elapsed time: 55696.274802 msecs"
real 0m58.540s
user 1m16.733s
sys 0m6.436s
Why is this version over 3 times faster? Partly because it requires only 4 context switches per second. $ pidstat -tw -p 26537
Linux 3.2.0-3-amd64 (azimuth) 07/29/2012 _x86_64_ (2 CPU)
11:52:03 AM TGID TID cswch/s nvcswch/s Command
11:52:03 AM 26537 - 0.00 0.00 java
11:52:03 AM - 26540 0.01 0.00 |__java
11:52:03 AM - 26541 0.01 0.00 |__java
11:52:03 AM - 26544 0.01 0.00 |__java
11:52:03 AM - 26549 0.01 0.00 |__java
11:52:03 AM - 26551 0.01 0.00 |__java
11:52:03 AM - 26552 2.16 4.26 |__java
11:52:03 AM - 26553 2.10 4.33 |__java
And queues are WAY slower than compare-and-set, which involves basically no context switching: $ time lein2 run atom
10000000
"Elapsed time: 969.599545 msecs"
real 0m3.925s
user 0m5.944s
sys 0m0.252s
$ pidstat -tw -p 26717
Linux 3.2.0-3-amd64 (azimuth) 07/29/2012 _x86_64_ (2 CPU)
11:54:49 AM TGID TID cswch/s nvcswch/s Command
11:54:49 AM 26717 - 0.00 0.00 java
11:54:49 AM - 26720 0.00 0.01 |__java
11:54:49 AM - 26728 0.01 0.00 |__java
11:54:49 AM - 26731 0.00 0.02 |__java
11:54:49 AM - 26732 0.00 0.01 |__java
TL;DR: node.js IPC is not a replacement for a real parallel VM. It allows you to solve a particular class of parallel problems (namely, those which require relatively infrequent communication) on multiple cores, but shared state is basically impossible and message passing is slow. It's a suitable tool for problems which are largely independent and where you can defer the problem of shared state to some other component, e.g. a database. Node is great for stateless web heads, but is in no way a high-performance parallel environment. |
Also Node starts up in 35ms and doesn't require all those parentheses - both of which are waaaay more important.