Hacker News new | ask | show | jobs
by sagarm 5999 days ago
Does anyone know why multiple kernel threads are used at all? If only one thread can run at a time, why not just use a single kernel thread?
2 comments

Because CPython doesn't want to invent it's own scheduler (and various other things that your operating system already does for you), further multiple threads can be active at the same time if they're doing any operations that release the GIL (such as I/O), Python can't know whether your thread will do any I/O before you create it.
However, the approach explained in this presentation does just that. The solution looks like a crude process operating systems scheduler.
It's not really, it's simply threads asking the main thread to drop every 5 ms, and then relying on the OS to schedule them. If Python had it's own scheduler it would do things like assigning priorities and actually deciding which thread would take over when the GIL was dropped.
Just to expand on the previous reply: a lot of C libraries have a blocking model where you call a function that blocks until it is complete. The simplest example is fread(). If the interpreter doesn't support multiple threads, you have to do a lot of gymnastics to make sure you never block the interpreter in an extension.

It can be done (this is how Ruby pre-1.9 works), but people complain about it a lot. Matz's rationale for why he added native OS threads to Ruby 1.9 was "people seem to like them."

In response to my fread() example you may be tempted to say that you can just put the underlying fd in non-blocking mode with fcntl(2). This is unfortunately unsafe (as I recall), though I cannot find the reference right now. Accessing the fd directly with read() and write() is non-blocking mode is of course safe, but then you have to do your own buffering.

Also, while poll() and select() give you most of what you need to make your I/O non-blocking, I recall a case where a pipe will be write-ready, but if you try to write too much data at the same time the write will block anyway, instead of returning a short count of bytes written. This was on RHEL3 Linux, so things may have changed since then.

I recall a case where a pipe will be write-ready, but if you try to write too much data at the same time the write will block anyway, instead of returning a short count of bytes written

That just sounds like a bug.

No, it's the specified behavior of a Unix pipe or socket[1]. If you want it not to block, you should set the O_NONBLOCK flag using fcntl(). Being "write ready" just means that it can take at least one byte in a buffer; it can't possibly be a promise not to block if you feed it a 2G buffer.

[1] But not disk files. Local disk storage isn't considered "blocking" if it simply needs to do a disk seek before returning the data. Most unix geeks end up finding this out experimentally at some point in their careers and using strong language. No, I don't know what they were thinking either...

In the SuS description of poll(), it says that POLLOUT means "Normal data may be written without blocking." However it does not say how much data. It is unexpected that a write() call would block instead of returning a short count; for example, if you feed it a 2G buffer, it could return 4096 indicating that only 4k of the 2G was actually written.
It could, but it doesn't and never has. The standard was written to admit a broader set of behavior than real systems actually implement. And in this case, there's absolutely nothing "surprising" to my eyes about a blocking (!) socket blocking on overflow.

Again: there is already a mechanism in place to support non-blocking behavior, and it's not this one.

If you want it not to block, you should set the O_NONBLOCK flag using fcntl().

Certainly, I had assumed that the OP was talking about poll over non-blocking pipes.