Hacker News new | ask | show | jobs
by marcher 6400 days ago
Python 2.6 and 3.0 ship with a new library called multiprocessing which provides mechanisms for process control and intercommunication, which conveniently sidesteps the GIL limitation that prevents multiple core usage.

The library is loosely based around the existing threading API, and allows for seamless transfer of Python objects between processes (unlike other forms of IPC).

2 comments

I know. The library is available to 2.5 as "processing", but starting OS processes when all you wanted were threads is somewhat ugly. I agree it's not possible to do it properly with the GIL in place and attempts to remove it did have less than amazing results on single-thread applications.

That's annoying... Even more annoying when I think it's easier to do in Java ;-)

If you're running on Linux, then all threads are processes.
For some definitions of "are" maybe. Fact is that processes don't share address spaces and thus pointers, which means that all data structures have to be serialised in some shape or form before they can be accessed by multiple processes. This makes some use cases infeasible in Python and other GIL based languages.

This is important and I'm concerned that so many advocates of Python (and Ruby and PHP) try to talk this huge issue away. I do use and love Python myself and I'd like to do more with it, so this is not pointless language war fighting.

Under Linux, the only difference between a thread and a process is not sharing the address space. Both POSIX threading libraries for Linux (NTPL is the current, LinuxThreads is the old one) do a clone() system call for every pthread_create(). The clone() system call is basically fork() with extra functionality (such as sharing the address space).

Inside the kernel, your "threads" and "processes" are both represented by the task_struct data structure. So, on Linux, threads and processes are the same thing in different flavors.

I brought this up since the parent said "starting OS processes when all you wanted were threads is somewhat ugly." My point is that on Linux, starting a "process" and starting a "thread" is fundamentally the same action and have the same cost.

They are not the same thing. Threads share address space and processes don't and that's a crucial difference.
I said same thing with different flavors. Literally, they are both represented by the same data structure in the kernel. The salient point of the discussion - cost of forking a process versus cost of spawning a thread - is the same. They require different styles of programming, but I was responding to your point on cost.

If you ever need to reason about scheduling in the Linux kernel, then you need to understand this concept. In the eyes of the scheduler, they're all the same.

Not so seemless. Things need to be serialised to transfer them. You need to specially craft code so that it is pickle safe... things don't just work automatically.

Also serialising 100MB or so worth of objects around is really slow. So the process module is not good for many cases of use.