Hacker News new | ask | show | jobs
by Spyro7 5365 days ago
Am I the only one that considers all of this ranting about Nodejs to be a little bit strange?

I would have never expected a post that was obviously a troll to prompt this much of a reaction on both sides of an issue. That so many of these rants and counter rants made it to the front page of Hacker News is somewhat discouraging.

I have been playing around with node for a few months, and I have tried to stay completely out of this "conversation". With that said, I would like to contribute just a few points:

Bad programmers will be bad programmers regardless of the tools that they use. If they use node and fail to write code that is completely non-blocking, then that is what we call a teachable moment. There is no magic button, all technologies have downsides and tradeoffs.

People keep talking about how nodejs is not good for computationally intensive tasks, but v8 is not a slow environment. Am I the only one that puts computationally intensive tasks into a queue to be taken care of by a pool of seperate processes? I am only just getting into web programming, and it seemed fairly obvious to me that you would not put something like that into your main event loop.

Also, if you find that you must put something computationally intensive in your main event loop, then you should use something like node-proxy or nginx to proxy those requests to a number of "nodes".

Over and over again, I have seen that people complain that node is not a good multithreaded environment. Well, yes... That is the tradeoff of using something that is closely tied to the concept of the event loop.

If you are using node because you feel comfortable with threads and you need threads, then you are making a serious mistake. If you are a new programmer and you are using node because someone told you that it is cool, then you are making a serious mistake. If you are using node because you have a problem that can be solved or addressed with an event loop and you understand the tradeoffs inherent in this approach, then you are doing the right thing.

To use nodejs effectively will often require rethinking your approach to fit the tool that you are using.

6 comments

So, I'm not a node.js guy, and nothing against it, but being a server-side guy in general, I'll paraphrase a quote that Zed Shaw paraphrased from Chinese Kung Fu novels:

"So the intermediate guy is doing all of these backflips and spinning roundhouse kicks and all that, it's very impressive, you couldn't imagine being in that good control of your body. He decides he's pretty good and spars with a master. The master barely even moves and puts the guy on his ass."

I think infatuation with event-driven i/o is one of those intermediate programmer things. We all took an advanced systems course in college, saw the literature from single-CPU days about how it makes so much sense, and more importantly saw how damn clever it was. There are situations where it makes a lot of sense (static file server), and then there are situations where you say "Wait, so you want me to run 8 instances of this thing in order to utilize my 8-core server?".

I think in this case, we have yet to see the master put anyone on their ass. We have non-masters slinging insults and ineffective demonstrations.

Don't typical Python and Ruby deployments also need 8 processes to utilize 8 cores? If I'm not mistaken, this is in fact how Heroku works.

As far as I'm aware, they do. GIL means that even using threads, they still need 8 processes to utilize 8 cores.

There are more languages than those 3, though, and if those 3 are the languages that you're considering, evented vs blocking i/o is way beside the point when it comes to performance. A super-simple thread-per-connection program with blocking i/o calls from C or Java will demolish the most sophisticated evented system you could ever design in a scripting language. That was my "barely even moves" vs "spinning roundhouse kick" comparison.

I did some benchmarks a few years ago playing around with different techniques in C#.

If you spin up a thread for each connection, and the connections are short lived, the time spent spinning up a thread will dominate the time spent communicating on the socket. Event loops perform much better than one-thread-per-connection in simple web services because they don't have the extra thread overhead. Its no accident the hello world app on nodejs.org is this sort of service.

But by far the fastest technique I found was to use a thread pool and an event loop together. When there was any work to do (new connection or new data), I scheduled the next available thread from a thread pool to do the work. This technique requires that you deal with both callback hell and thread synchronization. But, its blazing fast. Way faster than you can get with a single nodejs process.

The code is also many times more complicated, so it really depends on what you're doing. Despite knowing how to make a super high performance server, I do most of my work in nodejs. - It performs fine in most applications.

Could you take a look at SignalR? (http://www.hanselman.com/blog/AsynchronousScalableWebApplica...)

* Since this is dealing with async operations, an IIS worker thread takes an incoming request and hands it off to be processed on the CLR thread pool. At this point, the IIS worker thread immediately becomes available for processing more requests

* Threadpool threads will not be tied up by open connections and waiting for IO. Only when executing actual code/work will they be in use

* To explain why threadpool threads are not tied up: Async operations use async IO. As soon as an I/O operation begins, that calling thread is returned to the thread pool to continue other work. When I/O completes, signal is sent and another thread is picked up to finish up work.

> A super-simple thread-per-connection program with blocking i/o calls from C or Java will demolish the most sophisticated evented system you could ever design in a scripting language

Are you really sure about that? Perhaps 5 years ago, but have you tried recent versions of node (V8) or Python? I have, and even as an old-timey programmer, I'm impressed.

Summary:

Ted: "Node is cancer, because it's not the one true tool that can do everything! It may be good at IO bound code but it's not so hot at CPU bound stuff".

Node hackers: "Yes it is the one true tool!".

Me: face-palm.

OK, Threads can be used to do anything, but they are hard, while async is pretty easy (unless you want to do CPU bound stuff). async sucks for CPU bound stuff, but that's not the problem it's trying to solve (and anyone who makes a big fuss over it from either side is a fool). But none of this really matters much, until you actually need to scale.

If you are doing something simple, C or Java will be best, simply because they are faster. But not everyone uses C or Java, and it's not like speed matters that much when your App server can scale trivially, your DB is the real bottleneck, and you only have 3 users (one of whom is your cat).

If you need a lot of connections, some of which block (due to a call to a web service like BrowserID or Facebook - and there's a lot more web sites that need to be optimized for web APIs calls rather than calculating Fourier transforms), you need lots of processes (which is too heavy in Python), lots of threads (and then you need thread-safety, which is a pain, and very easy to screw up), or something async like Twisted or Tornado. Given that Tornado is already really easy to use, and basic async stuff is fairly easy to get right, the choice is easy for me. (I don't know enough about JS, Node, and V8 to really comment on Node, but I'll just assume it's roughly equivalent).

The thing is, I just don't trust threads (at least, not if I'm doing the code). There's far too many ways you can have weird bugs that won't show up without a massive testing system, or when you get a non-trivial number of users. And you can't integrate existing libraries without jumping through a lot of hoops.

Using callbacks looks like "barely even moves" to me, while multi-threaded code looks like a "spinning roundhouse kick", but then, maybe I'm just not good at multi-threaded code.

I guess the most important things is that threads need to be 100% thread-safe. Async code only has the be "async-safe" in the bits that need to be async. It looks like this:

    @async
    def do_something():
        do_something_not_async_safe()
        do_something_async(callback=finish_up)

    def finish_up():
        do_something_else_not_async_safe()
Whereas threads look like this:

    def threaded_code():
        do_something_100_percent_thread_safe()
        do_more_thread_safe_stuff()

        # fail here when you have a few users
        do_something_that_uses_an_unsafe_library()

        more_thread_safe_stuff()
If you really need to do CPU bound stuff, async sucks. You can create a second server, which handles the CPU bound stuff (and call it using a web interface which is async safe). Or there are pretty easy ways to call a subprocess asynchronously. But arguing over the merits of async programming using Fibonacci sequences as a talking point is not even wrong. Anyone who brings it up is just showing themselves to be a complete tool. That might be what Ted was trying to do (as many of the Node responses have been unbelievably lame), but it doesn't prove anything except that the internet has trolls, and plenty of idiots who still take the bait.
Please, show us even one Node developer who posted "Yes it is the one true tool!" – or even a sentiment that is remotely similar.
https://github.com/glenjamin/node-fib is close enough. Trying to explain how Node can do concurrent Fibonacci without calling another process is falling for the troll's trap - Node isn't "the one true tool", and you don't need to try to prove that it is. The trolls will just say that Fibonacci is too trivial, and you wouldn't use the same approach (co-operative multi-threading) for less trivial CPU-bound tasks.

You can use a hammer to bang in screws, but it's not usually the best way. If you are using co-operative multi-threading as a way to get Node to handle concurrent CPU-bound requests, you are Doing It Wrong (TM). In general, it's better to create a second (possibly not async) server to handle CPU intensive stuff, or fork stuff off to a subprocess (depending on the actual task). There might also be other ways - I'm not an expert.

I'm sure that if glenjamin had a less trivial CPU-bound task, he/she would handle it a better way (depending on what the task was). But to the Node haters, node-fib is just troll food. It's also an interesting example of how node.js works, but the trolls don't care.

Hm...

Bunch of people with limited social skills or sanity, in a virtual desert, having flame-wars with strawmen?

Sounds like burning man.

No, heroku doesn't work that way, heroku lets you spin up processes and those processes can be threaded. I use JRuby and get one process that can use all cores.
I think it's safe to say that typical Ruby deployments (as I qualified) are not using JRuby, especially not those on Heroku. Thus, even if the process spawns threads, they will be limited to one core at a time.

Whether JRuby's threads are actually running on multiple cores simultaneously depends on what mode Heroku is running the JVM in.

Except that event-driven io is actually the simpler solution compared to threads, isn't it?
Depends. For an application where you have significant CPU work, a simple worker pool (not coded by you), with an accept loop

while (s=accept()) { dispatchToWorkerPool(s) }

And an application-programmer method handleConnection(s) is pretty darned simple. Just do blocking I/O from your threads and rely on the machine to swap in and out appropriately.

For the next level, you can have an evented i/o layer that passes sets of buffers back and forth to your worker pool. That's more complex but still spares application programmers from having to worry about either eventing or threads, they just worry about passing back the right bytes.

Here's where things get interesting, though. It turns out someone studied this (in Java) and actually found that stupidly throwing 1k threads at the problem with blocking i/o performed better than a clever nonblocking server, due to linux 2.6's threading and context-switching improvements.

http://www.mailinator.com/tymaPaulMultithreaded.pdf

Some of that might be specific to Java's integration with linux, but think about it.. the linux guys did a pretty good job at getting the right thread to wake up and assigning it to runnable state. Moving to evented i/o might be the right move for your workflow but it also might make no difference at the cost of some additional complexity.

tl;dr context switching has gotten cheaper/better as linux has improved, 16 CPUs vs 1 CPU changes the math about context switching vs using select(), and worker pool libraries mean you shouldn't actually manage threads yourself.

That thread pool isn't actually that simple. How many threads do you use? If you throw 1,000 threads at the problem with a 2MB stack each, that's 2GB of DRAM you've thrown away (instead of 20MB * ncores per Node process) -- DRAM that could be caching filesystem data, for example, which could have a huge impact on overall performance.

With Node, the DRAM and CPU used scales with the number of cores and actual workload. With a thread pool, the DRAM used scales at least with the number of concurrent requests you want to be able to handle, which is often much larger than you can handle simultaneously (because many of them will be blocked much of the time).

Assuming you're not willing to reserve all that memory up front, the algorithm for managing the pool size also has to be able to scale quickly up and down (with low latency, that is) without thrashing.

What a load of crap... you just wasted 2gb of address space not of DRAM... You'll "waste" exactly up to the amount of stack each thread uses, rounded up to PAGE_SIZE which is usually 4kb

Let me guess, a nodejs fan?

You're right -- it's the touched pages that count. In many cases, that's a few MB per stack, which is what I said.
Alternatively, create the socket, fork N processes and have each of said processes run an accept loop. Assuming they're CPU bound, this is going to be pretty much just as efficient and you don't have to prat about with threads -or- evented I/O.

Ain't UNIX grand?

It depends, but I don't think either one is inherently simpler than the other. As a general rule of thumb, I find that asynchronous solutions are generally better for I/O bound stuff while threads are better for computationally expensive stuff.
I think this post is about ridiculing the childish behavior of the Node.js fandom. I have to agree that I've never seen anything quite like this before except the Church of St. Jobs. I always thought that we software engineers are reasonable people. But apparently a subset of us aren't.

However reluctant I maybe to join in this time-wasting back and forth. I'm just glad that someone has taken the time to point out just how inexperienced the crowd and ridiculous the whole situation is. We have enough technology in the world already. We don't need another one that makes making mistakes so much easier than before.

Lastly, I'm extremely disgusted by the misleading tag lines Ryan Dahl put on Node's front page. Programming isn't easy. It can only be easy for so long. Scaling is even harder because it actually requires deep knowledge and insight into how computers and networks work, something that I very much doubt that many of the current Node.js crowd understands. If your solution to your multiprogramming problem is to spawn more Node.js processes, maybe you've picked the wrong tech.

I've seen it from the following communities: PHP, Ruby, Perl, Python, Java, C++, C, C#, Haskell, Emacs, Vim, Common Lisp, and Scheme.

People are tribal. Some people are attracted to tribes, become attached without really knowing why, and start having the strong urge to fuck with the other tribes. Apparently this was good for the survival of the human race. Perhaps it's a bug now that we should consciously compensate for, like the desire to eat a box of doughnuts. mmm.... doughnuts... Good when you were a caveman. Bad when you sit in front of a desk for 16 hours a day and sleep for the other 8.

What's the alternative to scaling an application by running multiple copies? Scaling an application by running multiple threads? Exact same thing. The only advancement over this model is when the runtime can automatically parallelize your code, and you actually scale linearly when it does that. In the mean time, if you use a database server and not an in-memory database, guess what, you're "using queues" and "scaling by starting processes". People do it because it's easy and it works.

This whole thread/process/event queue non-sense is really ruining my appetite. Mmm fried bacon sausage wrap on a stick…

Guess what. When the packets come in from the network, they sit in an event queue. When Apache takes a request event, it takes it from a queue and hand off to a process. If you are proxying, the webapp server in the back takes the request events from a queue and hands off to a thread. When you make a DB call, the SQL goes to an event queue and the DB processes them 1 by 1. Real world web apps will always, is always and have always been done in a combination of event queues/threads and processes.

Nothing to see here. Moving on… oh look! Takoyaki!

__People are tribal. Some people are attracted to tribes, become attached without really knowing why, and start having the strong urge to fuck with the other tribes. Apparently this was good for the survival of the human race. Perhaps it's a bug now that we should consciously compensate for__

That this tribal behavior occurs among software engineering is a rather disappointing fact. Computers are pretty much the edge of technology in many aspects, technological achievements by Mankind should be the prove that we're able to use our brains in more advanced ways than basic instincts and produce great achievements like modern software.

This flood of programmers that prefers to join a tribe instead of enjoying the good things from many different 'tribes' just makes me wonder how many of us are really devoted to do something useful/positive.

  > That this tribal behavior occurs among software engineering 
  > is a rather disappointing fact. 
It occurs amongst software engineering humans. All humans get this to some degree, it's basic ingroup/outgroup psychology.

We're all humans here. It's nothing to do with "devoted to doing something positive" or "using our brains in more advanced ways." It's just the reality of being an evolved ape, and the lack or presence of this trait doesn't make anyone any better/worse than anyone else.

Yes, thank you for writing this. We are all humans, even if we don't want to be, and we have to think about our actions in the context of our genetic programming. In the case of tribalism, even though it's a strong feeling, we have to ignore it because it doesn't get us anything in programming language debates.

The best attitude to have is one of acceptance and an open mind, because the right programming tool applied to the right problem can make solving that problem orders of magnitude less difficult. You can have programmer friends even if you don't unconditionally hate the enemy. In fact, it seems, most people don't care about who you don't hate.

"I always thought that we software engineers are reasonable people. But apparently a subset of us aren't."

Unfortunately no one is immune to it, not even engineers, scientists, Wall St., etc. I try to remind myself of this and keep my baser instincts in check by periodically rereading pieces like Charlie Munger's 'On the Psychology of Human Misjudgement'[1], the list of logical fallacies [2], and anything I can find relating to the psychology and neuroscience of judgement, decision making, perception and bias. Becoming emotionally invested in anything makes it more difficult to admit you're wrong about it and change your mind in the presence of refuting data.

[1] http://duckduckgo.com/?q=munger+on+the+psychology+of+human+m...

[2] http://duckduckgo.com/?q=list+of+logical+fallacies

The whole target of his dissatisfaction is this quote on the Node home page: "Almost no function in Node directly performs I/O, so the process never blocks. Because nothing blocks, less-than-expert programmers are able to develop fast systems." That's bullshit. Evented programming isn't some magical fairy dust. If your request handler takes 500ms to run, you're not going to somehow serve more than two requests per second, node or no node. It's blocked on your request handling.

And all that stuff Apache does for you? Well, you get to have fun setting that up in front of the nodejs server. Your sysadmin will love you.

Basically if you're doing a lot of file/network IO that would normally block, node is great. You can do stuff while it's blocked, and callbacks are easier to pick up and handle than threads. But how often does that happen? Personally my Rails app spends about 10% of its time in the DB and the rest slowly generating views (and running GC, yay). AKA CPU-bound work. AKA stuff Node is just as slow at handling, with a silly deployment process to boot.

Whoa, whoa. "Just as slow at handling?" It's most likely an order of magnitude faster at handling those.

But you're right, deployment isn't a totally solved problem. Unless you just use Heroku: http://devcenter.heroku.com/articles/node-js

I mean, Node was created in 2009. I don't see anybody bragging about how easy it is to deploy yet; just that it's fast and easy to understand.

V8 is a lot faster than Ruby, so I'd expect page rendering to be a lot faster with Node than with Ruby (supposing both use good rendering engines).

Also it seems a lot easier to "outsource" computationally intensive tasks to other processes with Node than with Rails. With node you can stall the response until you receive a result of the computation (within limits), and use the wait time for other tasks. With Rails if you do that, you block one thread of your limited pool of threads. As an example: with my first Heroku Rails app (free hosting) I discovered the pool of threads was one.

That's a limitation of Rails, not Ruby. One could use Node.js in the same manner. Ruby has been doing async I/O in the form of EventMachine longer than Node.js has even existed.
NodeJS builds on some C async framework, so pretty much every language build on C could go the Node route - but it would be a lot of work. Personally I wish somebody would adapt it for Arc - or maybe not, so that if I ever apply to YC again I can do that for bonus points :-)

EventMachine is OK, I suppose, but I have heard that it has some warts. Are there any good web frameworks that build on EventMachine? Though I suppose the basic Sinatra-Style framework would not be that hard to build...

As of 1.3, Sinatra can do streaming using an evented webserver like Thin, which uses EventMachine. Goliath may also be worth checking out.
> Am I the only one that puts computationally intensive tasks into a queue to be taken care of by a pool of seperate processes?

Thank you. I really find it hard to believe that the reaction to Ted's post was to mess around with Fib implementations rather than to just say, "use a background queue," and move on.

If Ted has a point at all, it could be that he thinks that people using nodejs don't realize they're running in a loop. Maybe that's fair.

Ted hates queues, too.

But I don't, so instead of playing tough-guy and picking on random open source communities that don't want me, I get to write cool computer programs. Maybe we should all do the same.

What's his critique of queues?
It's here: http://teddziuba.com/2011/02/the-case-against-queues.html

TL;DR - It's not so much about queues, but about stacks, i.e. new software stacks. His proposition is that quite often you're better off with existing systems. He specifically mentions syslog, so you're logging your tasks and then the consumers monitor this log. Prevents data loss and lets you potentially restart lost tasks.

(That's his argument. I'd agree if you'd have to reimplement something like that in the pre-built *MQ solutions, but I don't know enough about all of them, maybe that – and more – is already in there.)

Interesting. I actually agree with his points (they are consistent with lessons I've learned overusing queues in the past).

I still use queues today, only more judiciously and with the issue of failure states a very well defined part of the design.

He seems to argue against having a blocking worker wait for the queue, which is exactly what you wouldn't do with Node.js
So I've worked on on an older open source project that does exactly what you suggest. The main loop accepts socket requests using non-blocking IO, figures out how to handle that request, and then forks a child process to handle it appropriately.

This can work great and can scale very nicely, but it also has some major drawbacks compared to other solutions. The whole point of the original post was that single-threaded non-blocking event loops are very fragile. You can easily end up accidentally freezing the entire application.

Other problems with forking event loops include:

- Simple Event loops require you to manually CPS-transform your code. This isn't an issue for a simple request-response cycle, but it quickly turns into spaghetti code for anything more complex. For a college class I once wrote a toy non-blocking P2P client in Python using entirely non-blocking IO, while all the other students chose to use threads. My client was very performant, but the code ended up being FAR more complex than everyone else's solutions for the simple reason that I had to code in CPS the complex multi-step interactions that occur between peers.

- If any of your tasks are not independent and must interact with each other, you have to start dealing with IPC, which can be very difficult to get right without a good message passing library. Even with one, it can still be more complicated than using threads. Clojure's STM, for instance, is useless without a shared memory model.

- Your application is not portable to Windows if you rely on forking.

And these are just some of the problems with using a simple non-blocking event loop to drive your entire application. Throw in the fact that you are using javascript, a very unscalable language never designed to do anything more than provide interactivity in the browser and that has incredibly sparse library support, and you've got lots more problems on top of that. Node.js just isn't as useful as its very enthusiastic proponents claim it to be.

This post from Glenn Vanderburg a few years back is incredibly relevant:

http://www.vanderburg.org/blog/Software/Development/sharp_an...

Especially this excerpt:

"Weak developers will move heaven and earth to do the wrong thing. You can’t limit the damage they do by locking up the sharp tools. They’ll just swing the blunt tools harder."