Hacker News new | ask | show | jobs
by SomeCallMeTim 5382 days ago
Very cool, but it's too bad you have to deal with all that to begin with.

I've been using LuaJIT embedded in Nginx (LuaNginxModule). Lua supports coroutines, so a function can just yield. Here's a brief example:

    -- Query a database using an http backend.
    -- Yields and handles other requests until the reply is complete.
    local record = util.getUserRecord(userId)

    -- Send some text to the client. Yields control while
    -- the actual transfer is in progress.
    ngx.print( "Result=" )
    
    -- Send the result, encoded as JSON, to the client
    -- Again, this call doesn't block the server.
    ngx.print( cjson.encode(result) )
    
With code like the above I can easily handle into the thousands of concurrent connections per second on the lowest end Linode VPS node available, with barely any load on the box -- and I'm told it should be able to handle 40k+ connections per second, if I were to do any tuning. Oh, and I have only 512Mb of RAM, which it doesn't even get close to under load. And the longest request took less than 500ms at high load.

I've been using OpenResty [1] which has the Lua module and a bunch of others all configured together. Works great, and I can't complain about the performance.

Someday I'm sure I'll hand the maintenance of this off, and then I might regret not using one of the "popular" frameworks. But the code is SO straightforward using this stack -- and what I'm using it for is so simple -- I think not.

[1] http://openresty.org/

2 comments

IIRC, the original plan for Node involved coroutines (maybe even Lua?). But, they were scrapped because they introduced a lot of the same problems as full threading. You never know if the state of the world is going to change in ways unrelated to the function you are about to call because some deep-nested subroutine might yield.
You never know if the state of the world is going to change

It's the same situation when using callbacks.

The difference is that, for the duration of the callback, you know the state will only change in ways directly related to the functions you are calling. Between callbacks, anything could happen. But, at least you have an island of sanity within the callback. With coroutines however, any function you call might block for some obscure reason (logging some debug info to a file, for example). That means you can only be sure that the world won't move behind your back as long as you don't call any functions :)

I don't have a lot of experience in this area. I'm just reporting back what I remember hearing.

The difference is that, for the duration of the callback, you know the state will only change in ways directly related to the functions you are calling.

That is true for a single threaded reactor, but in practice I haven't found it to be an especially useful property because naturally that doesn't include external state (i.e. the database).

Getting atomicity right in evented-code has in fact often been a hairier issue for me than doing the same with co-routines or threads, because when you're not allowed to block ever then you quickly find yourself in a situation where you need a retry-mechanism.

Such codebases then tend to quickly converge towards the actor-pattern (tied together by queues) which, ironically, could be had much easier by starting out with co-routines in first place.

Isn't that sort of the point? If you use node, you never know if the state of the world is going to change in ways unrelated to the callback you are about to pass, because you expect the world to change between now and the invocation of the callback.
The real reason will be that V8 does not support co-routines.
I think corysama is referring to Ryan Dahl experimenting with Lua (as well as C and Haskell) before settling on JS/V8.

I had several failed private projects doing the same on C, Lua, and Haskell. Haskell is pretty ideal but I’m not smart enough to hack the GHC. Lua is less ideal but a lovely language - I did not like that it already had a lot of libraries written with blocking code. No matter what I did, someone was going to load an existing blocking Lua library in and ruin it. C has similar problems as Lua and is not as widely accessible. There is room for a Node.js-like libc at some point – I would like to work on that some day.

V8 came out around the same time I was working on these things and I had a rather sudden epiphany that JavaScript was actually the perfect language for what I wanted: single threaded, without preconceived notions of “server-side” I/O, without existing libraries.

http://bostinnovation.com/2011/01/31/node-js-interview-4-que...

The threading model in Haskell (GHC) would be ideal, as it's essentially abstracted away from the developer. There's certainly the potential for a performance advantage there as well as easier to maintain code. The advantage Node.js has, is that the vast majority of web developers already know javascript. It doesn't need to be the "perfect" server-side implementation, it just needs to be good enough. It's success is due to it's easy of accessibility, something he would have missed out on had he gone the Haskell path.
Its not just potential - all the major Haskell web frameworks can handle more concurrent requests than node.js and scale to multi-core. http://www.yesodweb.com/blog/2011/03/preliminary-warp-cross-...

I don't think node.js is good enough because you have to deal with the issue being discussed here. In Haskell, as you said, you just write normal code with no worries about mutable state messing with your thread.

Mozilla's JS engine does, and there is at least some work being done to hook Node up to it.

https://developer.mozilla.org/en/JavaScript/Guide/Iterators_...

Ahh, but here's the deal: The way the Nginx Lua module is written, you store everything in local variables. Those stay consistent across yields, and are tied to a single invocation of the Lua function.

So except for things you should EXPECT to change (like the state of a database you're querying) between calls, the "world" you (as a programmer) should care about stays perfectly consistent.

Unless I'm not understanding what you're asking -- is there some situation that I haven't encountered where the state of something that isn't a local variable matters?

The problem wasn't intrinsic to coroutines as much as it was the way they made the event loop reentrant which is a horribly complex thing to track. In most cases like that you'd either suspend the loop or have coroutine local event loops.

Of course people like to point at the overhead of native threads and assume coroutines have similar overhead, which is total bunk. Ironically, event sources in node use a stack-like tracking element which brings back a similar sort of overhead you see in coroutines in Lua, for example.

I doubt we'll see node take another shot at coroutines but that's okay. Node will do fine without coroutines but it will come at the cost of making certain types of code a little less natural (same as eventide code in threads becomes unnatural).

Can you give an example of this?
This kind of stuff always makes me doubt why particular stacks are the most popular.

Wondering, is the OpenResty solution otherwise single-threaded-with-an-eventloop much like node?

Well, you can configure how many processes nginx runs with at startup, but otherwise yes it is an event loop based system. Lua is just a module like anything else in Nginx, and uses the Nginx event loop.
Ah right, I see.

From the nginx wiki:

> Unlike Apache's mod_lua and Lighttpd's mod_magnet, Lua code written atop this module can be 100% non-blocking on network traffic as long as you use the ngx.location.capture or ngx.location.capture_multi interfaces to let the nginx core do all your requests to mysql, postgresql, memcached, redis, upstream http web services, and etc etc etc (see HttpDrizzleModule, ngx_postgres, HttpMemcModule, HttpRedis2Module and HttpProxyModule modules for details).

Is that talk-via-nginx-commands thing cumbersome in practice?

No, it's pretty easy. An example:

    local doc = ngx.location.capture(  "/couchdb/hamster/"..uid );
This queries the local CouchDB for a particular user record, and stores it in a local variable.