Hacker News new | ask | show | jobs
by wahern 702 days ago
IIUC, what they mean by "migrate" is the client thread is paused and the server thread given the remainder of the time slice, similar to how pipe(2) originally worked in Unix and even, I think, early Linux. It's the flow of control that "conceptually" shifts synchronously. This can provide surprising performance benefits in alot of RPC scenarios, though less now as TLB, etc, flushing as part of a context switch has become more costly. There are no VM shenanigans except for some page mapping optimizations for passing large chunks of data, which apparently wasn't even implemented in the original Solaris implementation.

The kernel can spin up a thread on the server side, but this works just like common thread pool libraries, and I'm not sure the kernel has any special role here except to optimize context switching when there's no spare thread to service an incoming request and a new thread needs to be created. With a purely userspace implementation there may be some context switch bouncing unless an optimized primitive (e.g. some special futex mode, perhaps?) is available.

Other than maybe the file namespace attaching API (not sure of the exact semantics), and presuming I understand properly, I believe Doors, both functionally and the literal API, could be implemented entirely in userspace using Unix domain sockets, SCM_RIGHTS, and mmap. It just wouldn't have the context switching optimization without new kernel work. (See the switchto proposal for Linux from Google, though that was for threads in the same process.)

I'm basing all of this on the description of Doors at https://web.archive.org/web/20121022135943/https://blogs.ora... and http://www.rampant.org/doors/linux-doors.pdf

2 comments

Not quite.

There isn't a door_recv(2) systemcall or equivalent.

Doors truly don't transfer messages, they transfer the thread itself. As in the thread that made a door call is now just directly executing in the address space of the callee.

They're more like i432/286/mill cpu task gates.

> Doors truly don't transfer messages, they transfer the thread itself. As in the thread that made a door call is now just directly executing in the address space of the callee.

In somewhat anachronistic verbiage (at least in a modern software context) this may be true, but today this statement makes it sounds like code from the caller process is executing in the address space of the callee process, such that miraculously the caller code now can directly reference data in the callee. AFAICT that just isn't the case, and wouldn't even make sense--i.e. how would it know the addresses without a ton of complex reflection that's completely absent from example code? (Caller and callee don't need to have been forked from each other.) And according to the Linux implementation, the "argument" (a flat, contiguous block of data) passed from caller to callee is literally copied, either directly or by mapping in the pages. The caller even needs to provide a return buffer for the callee's returned data to be copied into (unless it's too large, then it's mapped in and the return argument vector updated to point to the newly mmap'd pages). File descriptors can also be passed, and of course that requires kernel involvement.

AFAICT, the trick here pertains to scheduling alone, both wrt to the hardware and software systems. I.e. a lighter weight interface for the hardware task gating mechanism, like you say, reliant on the synchronous semantics of this design to skip involving the system scheduler. But all the other process attributes, including address space, are switched out, perhaps in an optimized matter as mentioned elsethread but still preserving typical process isolation semantics.

If I'm wrong, please correct me with pointers to more detailed technical documentation (Or code--is this still in Illuminos?) because I'd love to dig more into it.

FWIW, Here's the Solaris man page for libdoor: https://docs.oracle.com/cd/E36784_01/html/E36873/libdoor-3li... Did you mean door_call or door_return instead of door_recv?

I didn't imply that the code remains and it's only data that is swapped out. The thread jumps to another complete address space.

It's like a system call instruction that instead of jumping into the kernel, jumps into another user process. There's a complete swap out of code and data in most cases.

Just like with system calls how the kernel doesn't need a thread pool to respond to user requests applies here. The calling thread is just directly executing in the callee address space after the door_call(2).

> Did you mean door_call or door_return instead of door_recv?

I did not. I said there is no door_recv(2) systemcall. The 'server' doesn't wait for messages at all.

thanks for finding the man page!
I think what doors do is rendezvous synchronization: the caller is atomically blocked as the callee is unblocked (and vice versa on return). I don't think there is an efficient way to do that with just plain POSIX primitives or even with Linux specific syscalls (Binder and io_uring possibly might).
Sounds a bit like Google's proposal for a `switchto_switch` syscall [1] that would allow for cooperative multithreading bypassing the scheduler.

(the descendants of that proposal is `sched_ext`, so maybe it is possible to implement doors in eBPF + sched_ext?)

[1]: https://youtu.be/KXuZi9aeGTw?t=900