Hacker News new | ask | show | jobs
by jbooth 5365 days ago
So, I'm not a node.js guy, and nothing against it, but being a server-side guy in general, I'll paraphrase a quote that Zed Shaw paraphrased from Chinese Kung Fu novels:

"So the intermediate guy is doing all of these backflips and spinning roundhouse kicks and all that, it's very impressive, you couldn't imagine being in that good control of your body. He decides he's pretty good and spars with a master. The master barely even moves and puts the guy on his ass."

I think infatuation with event-driven i/o is one of those intermediate programmer things. We all took an advanced systems course in college, saw the literature from single-CPU days about how it makes so much sense, and more importantly saw how damn clever it was. There are situations where it makes a lot of sense (static file server), and then there are situations where you say "Wait, so you want me to run 8 instances of this thing in order to utilize my 8-core server?".

2 comments

I think in this case, we have yet to see the master put anyone on their ass. We have non-masters slinging insults and ineffective demonstrations.

Don't typical Python and Ruby deployments also need 8 processes to utilize 8 cores? If I'm not mistaken, this is in fact how Heroku works.

As far as I'm aware, they do. GIL means that even using threads, they still need 8 processes to utilize 8 cores.

There are more languages than those 3, though, and if those 3 are the languages that you're considering, evented vs blocking i/o is way beside the point when it comes to performance. A super-simple thread-per-connection program with blocking i/o calls from C or Java will demolish the most sophisticated evented system you could ever design in a scripting language. That was my "barely even moves" vs "spinning roundhouse kick" comparison.

I did some benchmarks a few years ago playing around with different techniques in C#.

If you spin up a thread for each connection, and the connections are short lived, the time spent spinning up a thread will dominate the time spent communicating on the socket. Event loops perform much better than one-thread-per-connection in simple web services because they don't have the extra thread overhead. Its no accident the hello world app on nodejs.org is this sort of service.

But by far the fastest technique I found was to use a thread pool and an event loop together. When there was any work to do (new connection or new data), I scheduled the next available thread from a thread pool to do the work. This technique requires that you deal with both callback hell and thread synchronization. But, its blazing fast. Way faster than you can get with a single nodejs process.

The code is also many times more complicated, so it really depends on what you're doing. Despite knowing how to make a super high performance server, I do most of my work in nodejs. - It performs fine in most applications.

Could you take a look at SignalR? (http://www.hanselman.com/blog/AsynchronousScalableWebApplica...)

* Since this is dealing with async operations, an IIS worker thread takes an incoming request and hands it off to be processed on the CLR thread pool. At this point, the IIS worker thread immediately becomes available for processing more requests

* Threadpool threads will not be tied up by open connections and waiting for IO. Only when executing actual code/work will they be in use

* To explain why threadpool threads are not tied up: Async operations use async IO. As soon as an I/O operation begins, that calling thread is returned to the thread pool to continue other work. When I/O completes, signal is sent and another thread is picked up to finish up work.

> A super-simple thread-per-connection program with blocking i/o calls from C or Java will demolish the most sophisticated evented system you could ever design in a scripting language

Are you really sure about that? Perhaps 5 years ago, but have you tried recent versions of node (V8) or Python? I have, and even as an old-timey programmer, I'm impressed.

Summary:

Ted: "Node is cancer, because it's not the one true tool that can do everything! It may be good at IO bound code but it's not so hot at CPU bound stuff".

Node hackers: "Yes it is the one true tool!".

Me: face-palm.

OK, Threads can be used to do anything, but they are hard, while async is pretty easy (unless you want to do CPU bound stuff). async sucks for CPU bound stuff, but that's not the problem it's trying to solve (and anyone who makes a big fuss over it from either side is a fool). But none of this really matters much, until you actually need to scale.

If you are doing something simple, C or Java will be best, simply because they are faster. But not everyone uses C or Java, and it's not like speed matters that much when your App server can scale trivially, your DB is the real bottleneck, and you only have 3 users (one of whom is your cat).

If you need a lot of connections, some of which block (due to a call to a web service like BrowserID or Facebook - and there's a lot more web sites that need to be optimized for web APIs calls rather than calculating Fourier transforms), you need lots of processes (which is too heavy in Python), lots of threads (and then you need thread-safety, which is a pain, and very easy to screw up), or something async like Twisted or Tornado. Given that Tornado is already really easy to use, and basic async stuff is fairly easy to get right, the choice is easy for me. (I don't know enough about JS, Node, and V8 to really comment on Node, but I'll just assume it's roughly equivalent).

The thing is, I just don't trust threads (at least, not if I'm doing the code). There's far too many ways you can have weird bugs that won't show up without a massive testing system, or when you get a non-trivial number of users. And you can't integrate existing libraries without jumping through a lot of hoops.

Using callbacks looks like "barely even moves" to me, while multi-threaded code looks like a "spinning roundhouse kick", but then, maybe I'm just not good at multi-threaded code.

I guess the most important things is that threads need to be 100% thread-safe. Async code only has the be "async-safe" in the bits that need to be async. It looks like this:

    @async
    def do_something():
        do_something_not_async_safe()
        do_something_async(callback=finish_up)

    def finish_up():
        do_something_else_not_async_safe()
Whereas threads look like this:

    def threaded_code():
        do_something_100_percent_thread_safe()
        do_more_thread_safe_stuff()

        # fail here when you have a few users
        do_something_that_uses_an_unsafe_library()

        more_thread_safe_stuff()
If you really need to do CPU bound stuff, async sucks. You can create a second server, which handles the CPU bound stuff (and call it using a web interface which is async safe). Or there are pretty easy ways to call a subprocess asynchronously. But arguing over the merits of async programming using Fibonacci sequences as a talking point is not even wrong. Anyone who brings it up is just showing themselves to be a complete tool. That might be what Ted was trying to do (as many of the Node responses have been unbelievably lame), but it doesn't prove anything except that the internet has trolls, and plenty of idiots who still take the bait.
Please, show us even one Node developer who posted "Yes it is the one true tool!" – or even a sentiment that is remotely similar.
https://github.com/glenjamin/node-fib is close enough. Trying to explain how Node can do concurrent Fibonacci without calling another process is falling for the troll's trap - Node isn't "the one true tool", and you don't need to try to prove that it is. The trolls will just say that Fibonacci is too trivial, and you wouldn't use the same approach (co-operative multi-threading) for less trivial CPU-bound tasks.

You can use a hammer to bang in screws, but it's not usually the best way. If you are using co-operative multi-threading as a way to get Node to handle concurrent CPU-bound requests, you are Doing It Wrong (TM). In general, it's better to create a second (possibly not async) server to handle CPU intensive stuff, or fork stuff off to a subprocess (depending on the actual task). There might also be other ways - I'm not an expert.

I'm sure that if glenjamin had a less trivial CPU-bound task, he/she would handle it a better way (depending on what the task was). But to the Node haters, node-fib is just troll food. It's also an interesting example of how node.js works, but the trolls don't care.

Hm...

Bunch of people with limited social skills or sanity, in a virtual desert, having flame-wars with strawmen?

Sounds like burning man.

No, heroku doesn't work that way, heroku lets you spin up processes and those processes can be threaded. I use JRuby and get one process that can use all cores.
I think it's safe to say that typical Ruby deployments (as I qualified) are not using JRuby, especially not those on Heroku. Thus, even if the process spawns threads, they will be limited to one core at a time.

Whether JRuby's threads are actually running on multiple cores simultaneously depends on what mode Heroku is running the JVM in.

Except that event-driven io is actually the simpler solution compared to threads, isn't it?
Depends. For an application where you have significant CPU work, a simple worker pool (not coded by you), with an accept loop

while (s=accept()) { dispatchToWorkerPool(s) }

And an application-programmer method handleConnection(s) is pretty darned simple. Just do blocking I/O from your threads and rely on the machine to swap in and out appropriately.

For the next level, you can have an evented i/o layer that passes sets of buffers back and forth to your worker pool. That's more complex but still spares application programmers from having to worry about either eventing or threads, they just worry about passing back the right bytes.

Here's where things get interesting, though. It turns out someone studied this (in Java) and actually found that stupidly throwing 1k threads at the problem with blocking i/o performed better than a clever nonblocking server, due to linux 2.6's threading and context-switching improvements.

http://www.mailinator.com/tymaPaulMultithreaded.pdf

Some of that might be specific to Java's integration with linux, but think about it.. the linux guys did a pretty good job at getting the right thread to wake up and assigning it to runnable state. Moving to evented i/o might be the right move for your workflow but it also might make no difference at the cost of some additional complexity.

tl;dr context switching has gotten cheaper/better as linux has improved, 16 CPUs vs 1 CPU changes the math about context switching vs using select(), and worker pool libraries mean you shouldn't actually manage threads yourself.

That thread pool isn't actually that simple. How many threads do you use? If you throw 1,000 threads at the problem with a 2MB stack each, that's 2GB of DRAM you've thrown away (instead of 20MB * ncores per Node process) -- DRAM that could be caching filesystem data, for example, which could have a huge impact on overall performance.

With Node, the DRAM and CPU used scales with the number of cores and actual workload. With a thread pool, the DRAM used scales at least with the number of concurrent requests you want to be able to handle, which is often much larger than you can handle simultaneously (because many of them will be blocked much of the time).

Assuming you're not willing to reserve all that memory up front, the algorithm for managing the pool size also has to be able to scale quickly up and down (with low latency, that is) without thrashing.

What a load of crap... you just wasted 2gb of address space not of DRAM... You'll "waste" exactly up to the amount of stack each thread uses, rounded up to PAGE_SIZE which is usually 4kb

Let me guess, a nodejs fan?

You're right -- it's the touched pages that count. In many cases, that's a few MB per stack, which is what I said.
Alternatively, create the socket, fork N processes and have each of said processes run an accept loop. Assuming they're CPU bound, this is going to be pretty much just as efficient and you don't have to prat about with threads -or- evented I/O.

Ain't UNIX grand?

It depends, but I don't think either one is inherently simpler than the other. As a general rule of thumb, I find that asynchronous solutions are generally better for I/O bound stuff while threads are better for computationally expensive stuff.