Hacker News new | ask | show | jobs
by fauigerzigerk 5999 days ago
I have a slightly weird and probably completely unworkable "idea" that has nothing to do with this particular GIL improvement but with the broader GIL issue.

The problem with removing the GIL seems to be garbage collection. I wonder why it's not possible to introduce a new type of object reference that exempts referenced objects from being garbage collected.

Then multiple Python interpreters could be started in separate threads, each with its own unmodified GIL, and the only thing they could access would be their own thread local data and those special shared objects.

What this amounts to is basically an implementation of a multi process architecture on top of a multi threaded architecture. The crucial difference is that the memory shared among the interpreters could hold pointers and thus proper in memory data structures and not just a BLOB into which everything has to be serialized as in the case of conventional shared memory.

Of course the shared objects would have to be manually deleted.

One issue I see is that when such a special object is created, all the objects it creates recursively would also have to be allocated in that pool of special objects. But I think it should be possible to use some kind of global flag to special case the allocator.

Well, there is probably a huge number of issues with this kind of trickery and it's definately not a long term solution. But I'd love to use Python much more than I do and the GIL issue is what prevents that for me at the moment.

3 comments

> The problem with removing the GIL seems to be garbage collection.

I'm not sure this is the real problem; after all, reference counts can be atomically incremented and decremented.

The real problem is that primitive operations in Python (like "foo.bar") cannot safely be performed in C without locking, because you need the hash table to remain consistent while you are doing lookups and/or insertions. This forces you to either wrap all such operations in locks (which has been tried, and slows down the single-threaded case by something like 2x) or reimplement them with lock-free data structures. The latter could be an interesting experiment; you could probably implement a tree-based map using RCU.

As does perl. You need to declare variables as shared, there is apparently some performance hit, but it's mostly localized to the shared variables, so reasonably manageable. I've actually found it easier to work with than implicit shared data across all threads because you actually need to think about the exposed surface area and how you can minimize it.
People write (http://www.julmar.com/blog/mark/PermaLink,guid,3670d081-0276...) that even .NET itself has a similar kind of GC (partially concurrent, only address fixup stops the world). Unfortunately I can't check how it compares to Singularity.
I'm pretty sure someone built a Python like that, but it did not work out. Someone whose name I don't recall from the #python freenode channel. I think perhaps C module were one of the issues, or sharing of global scopes (which then need fine-grained locks to synchronized access).

BTW, there's one multi-processing shared objects module here: http://poshmodule.sourceforge.net/