Hacker News new | ask | show | jobs
by nupark 5562 days ago
> As for the GIL, the author didn't even consider multiple processes ...

I really hate this argument for its disingenuity. Some things (including message passing) are naturally fastest with shared memory. Multiple processes are not an equivalent substitute.

Clojure, for instance, leverages shared memory to implement its excellent, lightweight STM and related high-level threading constructs, and shared memory is what allows Clojure to easily implement cheap, MVCC functional data structures.

Multiple processes aren't an answer, here. To implement the same high-level constructs efficiently requires re-introducing shared memory through more complex, less efficient mechanisms, and often leaves the problem of sharing access to higher-level constructs (objects, instead of data) unsolved.

Multiple processes are a poor work-around for a lack of support for concurrency, not a solution. They make sense if you're sandboxing, but make little sense for implementing a high-performance concurrent server.

To pre-empt the erlang discussion -- erlang's message passing model does scale, but Erlang's runtime does require support for real shared memory concurrency in order to maximize single-machine performance.

1 comments

> As for the GIL, the author didn't even consider multiple processes ... I really hate this argument for its disingenuity. Some things (including message passing) are naturally fastest with shared memory. Multiple processes are not an equivalent substitute.

This isn't a disingenuous answer at all, since it goes on to point out that A Web app is one giant workload where this model makes sense: multiprocessing is the approach you should be taking with a Web application.

As someone with a strong Java background, I have to continually force myself to be aware of my bias against concurrency-via-multiple-processes.

If you have a stateless application (and if at all possible, statelessness is a good thing [1]), then multiple processes scale vertically reasonably well (on the same machine), but having a multiple-process architecture makes scaling horizontally (across machines) natural and easy.

It's true that shared access to higher-level constructs in a multiple process environment isn't solved, but (a) shared access to data is an anti-pattern [2], and (b) Memcache plus some kind of serialization works pretty well when you do need shared data.

[1] Yes, applications exist where statelessness doesn't make sense. In this case, though the author was originally developing on AppEngine, so I doubt there is much state in the app.

[2] I'm not saying you should never have shared data. I am saying you should avoid it as much as possible.

Shared memory makes perfect sense for a webapp, too. I question the whole stateless mantra -- if state is reconstructable on other nodes, then what is wrong with state that improves performance -- such as caching.

Everything from database connections to user data can be cached locally, shared across connections, and does not require multiple processes each with their own large heap.

I'd like to elaborate on more examples (such as comet and efficient handling of a large number of blocked connections while allowing unblocked connections to proceed concurrently while maintaining cached local state) ... But I'm on an iPad and this keyboard is driving me crazy.

Yep, caching is great.

But I think caching is something best done in system/platform code, not in your application.

Like you say, there are plenty of examples where sharing state makes sense. But I still believe that state in application code is something that be avoided, and that most of the time it's best to rely on platforms that do state management for you.

For example, session support in web platforms is a great example of something that supplies (simulated) state, and is supplied by the platform.

The most efficient way for a platform to cache many types of state is in the process itself, local to where that state is required. Database connection pooling, for example, is more efficient implemented within a single multithreaded process, where the entirety of the pool is immediately available to all concurrent connections, and cache locality exists between the data looked up (and cached) and the connections using that data.

I have a hard time with the notion that web apps should be "stateless." It seems to be an argument borne out of limitations of the frameworks/platforms being used, and repudiated by the fact that such platforms do share state, but are forced to use less efficient external mechanisms (such as network requests to memcached, the local database, etc), rather than leveraging the advantages of data locality as available in a non-multiprocess system.

We've often taken advantage of sticky sessions to allow individual servers to maintain and share state across requests while ensuring that the state could be reconstructed by another server should it become necessary. It is simply more efficient to do so, and efficiency in implementation directly translates to dollars spent on operational costs, as well as effects on human observable response times.

There are plenty of examples of real world, non-trivial situations in which a web app will need to employ caching. For example, to fulfill a user request data needs to be fetched from an expensive web service call, but one which is likely to be shared across user requests. How does an application avoid managing this cache itself?
Incidentally, I think it's ironic that I'm "defending" a imperative language by using lack-of-state as a feature when it is being compared to a functional language.

But I think it's an illuminating discussion anyway.