Hacker News new | ask | show | jobs
by cscotta 5591 days ago
Pushing a language, its libraries, and extensions which have traditionally presumed linear execution into a threaded/parallel world is a minefield. I'd like to offer a few thoughts on some factors which might have contributed to this complexity, to reframe the question of what might constitute the "best" concurrency strategy in terms of tradeoffs, and conclude with a call for us to take up the task of reasoning about our programs in a parallel context.

We've seen an explosion of interest in non-threaded, single-process approaches to concurrency in the Ruby and Python communities in the past couple years. Much of the difficulty here lies in frameworks, libraries, and development paradigms which were not designed to be threadsafe (or to teach threadsafe programming) from the start. Developers and library authors have for years been able to operate under the assumption that Ruby code in a single process is executed serially and not in parallel. In environments which permit parallel, concurrent execution inside the boundaries of a single process, authors must return to the foundations of what they've created and scrutinize every bit, reconceptualizing their programs in terms of shared state and mutation.

Individual developers relying on these libraries must also scrutinize each and every gem they require to ensure that the code is threadsafe as well. This is not an easy task, as reasoning about state is inherently difficult, especially when the original program may not have been designed with concurrency in mind, and even moreso when one was not the original author. Further, we haven't seen a significant chasm between things which "are" or "are not" threadsafe emerge in the Ruby community, and it is not common to certify one's code as "threadsafe" on release. That said, it's commendable that the Rails team has worked diligently to piece through each component of the framework and certify it clean.

I do not mean to suggest that multithreaded code written in Ruby is uncommon, unsafe, or a bad idea in general. Many applications running in environments without a GIL/GVL such as JRuby utilize this functionality effectively today. What I'm suggesting is that process-based concurrency (Mongrel/Passenger/Unicorn, et al) is often favored by developers because it eliminates a large swath of potential pitfalls (or rather, trades them for increased usage of system resources).

In light of this, we've seen developers in the Ruby and Python communities experimenting with and popularizing alternate concurrency models, not the least of which include cooperatively-scheduled fibers/co-routines and evented programming. These approaches avoid a specific subset of the challenges in concurrent multithreaded execution, while enabling limited concurrency in one process. Matt is quick to point out that these models don't achieve true concurrent parallelism, but do offer significant benefit over standard serial execution. At the same time, they impose a different sort of complexity upon the programmer -- either requiring her to reason about a request or operation in terms of events and callbacks, or to use a library to cooperatively schedule multiple IO-bound tasks within a single VM (which may hide some complexity, but introduce uncertainty by clouding what is actually happening behind the scenes).

I wish Ruby and Python the best here. I have a significant investment in both. But as long as developers must ask of libraries "Is it threadsafe?" with fear and negative presumption, traditional multithreaded concurrency might not be ideal given the investment required to achieve it safely and correctly.

More than anything, I hope for continued complex reasoning and thought on this question. We simply must move beyond reductionist statements such as "threads are hard!" to progress. I'd suggest that threaded programming in and of itself is not hard per se -- reasoning about shared state is, and the developer bears a responsibility to either eliminate, or minimize and encapsulate it properly in her code. Evented and coroutine-oriented programming brings its own challenges. Actor models are interesting and may be appropriate for some contexts. STM's pretty cool too, but it would not be proper to hang one's hopes upon something which is not likely to cure all ills.

There is no panacea for concurrency. There will always be challenges and tradeoffs involved in developing performant, efficient applications within the constraints of both hardware and programmer resources. We would do well to push ourselves toward a greater understanding of the challenges in developing such programs, as well as the tradeoffs involved with each approach. I think this article is a step in the right direction.

1 comments

@cscotta as the author of the blog post, I can only agree with the concerns you raised and thank you for your kind words.

Concurrency is something both Ruby and Python have to take seriously pretty quickly if both communities want to mature and play a more important role in the coming years. As you pointed out, there are different ways to get some sort of concurrency and they all present challenges and tradeoffs while none will cure all ills. But at the end of the day what matters is a community of people embracing these approaches and pushing the language further.

By removing the Global Interpreter Lock most of the alternative Ruby implementations are making a statement and the community seems to react. If we educate the community, commonly used code will become threadsafe over time and developers will learn what it means to write threadsafe code. If made easy, co-routines and non-blocking IO will be used. If implemented well and properly explained, actors will be used more often when it makes sense to do so.

My goal was to try to simplify the concurrency problem so the community as a whole can discuss the topic. I think that the concurrency challenge can be improved by increasing the awareness, getting people motivated and having an open discussion. There is a reason why Ruby and Python still have a GIL and why removing it to improve concurrency would have its downside. I want to make sure people understand that before blaming Matz for not removing the GIL already. The Ruby implementation fragmentation also hurts progress done in MRI, if you look at it, alternative implementations often have more people working on them than people working on MRI!

Let's hope that this article is indeed a step in the right direction.

Absolutely - thanks, Matt. This is a very well-written, thorough article, and it provides a clear, thoughtful survey of the landscape for folks who are interested in parallelizing their programs and learning more about the advantages and tradeoffs implicit in different approaches. This piece has a lot of potential to catalyze thought within the community about how best to move forward.

The explosion of interest in the subject intrigues me. The work that's been going on in terms of making mainstream Ruby libraries threadsafe is great to see, along with fibers and evented approaches. MacRuby's exposure of Grand Central is especially unique. But interest and upvotes must also be followed by action. There will be a lot of false starts. We'll probably see a handful of approaches which bloom and fade from popularity. While the "fragmentation" issue is there, I tend to think of it in terms of spirited experimentation. In any case, people really care now, and it's that drive that forces a community forward. That's good to see.

Keep it up!