| I have a slightly weird and probably completely unworkable "idea" that has nothing to do with this particular GIL improvement but with the broader GIL issue. The problem with removing the GIL seems to be garbage collection. I wonder why it's not possible to introduce a new type of object reference that exempts referenced objects from being garbage collected. Then multiple Python interpreters could be started in separate threads, each with its own unmodified GIL, and the only thing they could access would be their own thread local data and those special shared objects. What this amounts to is basically an implementation of a multi process architecture on top of a multi threaded architecture. The crucial difference is that the memory shared among the interpreters could hold pointers and thus proper in memory data structures and not just a BLOB into which everything has to be serialized as in the case of conventional shared memory. Of course the shared objects would have to be manually deleted. One issue I see is that when such a special object is created, all the objects it creates recursively would also have to be allocated in that pool of special objects. But I think it should be possible to use some kind of global flag to special case the allocator. Well, there is probably a huge number of issues with this kind of trickery and it's definately not a long term solution. But I'd love to use Python much more than I do and the GIL issue is what prevents that for me at the moment. |
I'm not sure this is the real problem; after all, reference counts can be atomically incremented and decremented.
The real problem is that primitive operations in Python (like "foo.bar") cannot safely be performed in C without locking, because you need the hash table to remain consistent while you are doing lookups and/or insertions. This forces you to either wrap all such operations in locks (which has been tried, and slows down the single-threaded case by something like 2x) or reimplement them with lock-free data structures. The latter could be an interesting experiment; you could probably implement a tree-based map using RCU.