Hacker News new | ask | show | jobs
by chadr 3667 days ago
Semi related to this, is it possible to run multiple CPython interpreters in the same process, but with restrictions on shared memory? The idea being that each interpreter would still have its own GIL, each interpreter would have restrictions on shared memory (like sharing immutable structures only through message passing). Note, I'm not a big Python user so if this already exists, has been discussed, etc I am not aware.
2 comments

I'm not deeply familiar with the CPython API, but that should be completely straightforward as long as you don't trade native CPython structures between interpreters. If you start with something like Numpy's array interface, and implement an array in C that knows that it could be accessed by multiple Python interpreters and needs to become immutable before it's passed, it should work. But I wouldn't expect it to be easy to take a regular PyObject and move it between interpreters, so the cost of serialization and deserialization might be an issue.
Sort of.

Note that Python has support for shared memory:

https://docs.python.org/2/library/multiprocessing.html#shari...

In fact, `numpy` has its own mechanisms to support shared memory between processes:

https://bitbucket.org/cleemesser/numpy-sharedmem

Neither of these approaches seem to be used very commonly in practice.

Python itself has some "sub-interpreter" support. There was a long conversation about this last year:

https://mail.python.org/pipermail/python-ideas/2015-June/034...

Finally, I have a working approach using `dlmopen` to host multiple interpreters within the same process:

https://gist.github.com/dutc/eba9b2f7980f400f6287

- the approach is so bizarre, because it's a very naïve multiple-embedding. It was intended to prove that you could run a Python 2 and a Python 3 together in the same process as part of a dare. This was thought impossible, since there are symbols with non-unique names that the dynamic linker would be unable to distinguish (which lead me to the `RTLD_DEEPBIND` flag for `dlopen`,) and that there is global state in a Python interpreter that interacts in undesirable ways (which lead me to `dlmopen` and linker namespaces.)

- this approach is stronger than the traditional subinterpreter approach, since I can host multiple interpreters of distinct versions. i.e., I can host a Python 1.5 inside a Python 2.7 inside a Python 3.5.

- the approach is stronger in that I completely isolate C libraries. There's a good amount of functionality provided by C libraries that maintain global state. e.g., `locale.setlocale` is a wrapping of C stdlib locale and is globally scoped.

- this approach is weaker in that it requires a dynamic linker that supports linker namespaces, which effectively limits its use on Windows

- this approach is weaker in that it's not complete: there's insufficient interest in this approach for me to actually write the shims to allow communication between processes.

- this approach is weaker in that it has some weird restrictions such as being able to spawn only 15 sub-interpreters before running out of thread-local storage space

I suppose the premise is that the GIL-removal efforts involve pessimistic coördination. A sub-interpreter approach might have a lighter touch and allow the user to handle coördination between processes (perhaps even requiring/allowing them to handle locks themselves.)

I followed the sub-interpreter thread with great interest, but after the implementation went in I haven't seen anyone build the kind of multiprocess tooling that it was designed to enable. Have you heard of anything?