Hacker News new | ask | show | jobs
by pcwalton 3143 days ago
> It does seem to me that a lot of people are a bit bedazzled by the top-level stuff that various languages offer, and forget that under the hood, everyone's using the event-based interfaces.

Yup. It's all very similar under the hood.

The most important difference between I/O models is whether the paradigm involves explicit vs. implicit management of the event loop. Callback models like Node, async/await style models like those of C#, and low-level primitives like IOCP, epoll, and kqueue fall into the former category. Go/Erlang, plain old threads, and even Unix processes fall into the latter category. There are advantages and disadvantages of each model.

Within each of these broad categories, the distinctions are, IMHO, much less interesting, and they're often made out to be more significant than they actually are. In particular, the distinction between runtimes like Go and regular OS pthreads is often made out to be more important than it really is, when the difference ultimately boils down to the CPU privilege level that thread management runs at.

1 comments

Patrick, on the 2.6+ Linux kernels, is there a significant difference between threads and processes? It seems like both threads and processes are created via clone and the only difference is memory access?

I often hear "context switching between threads is cheaper" but pthreads still have their own PID and everything, so is this really the case?

Is there really much advantage to pthreads over the way PostgreSQL does things with efficient CoW sharing between processes for the binary?

The significance of the distinction depends entirely on the use case.

Yes, they’re both created with clone, but with different levels of sharing. A pthread will share the virtual address space of its parent, which makes shared memory simple to implement; use the same pointer and you’re done. CoW is not “sharing” really, because you can’t communicate over it, it just saves some creation overhead.

With CoW, technically nothing gets copied initially, but as soon as the new process starts executing, it’s going to start copying the stack frame and any other regions it’s using. With a pthread you can be certain it will just copy the stack.

Context switches are usually cheaper when you don’t need to throw out the old virtual address space (and invalidate the Translation Lookaside Buffer). Pthreads share virtual address space, so there is no need to flush the TLB.

In a use case like Postgres, you don’t necessarily need to optimise for context switches. If you have a lot of concurrent connections, each of which has one process, then you’ll only hit limits with context switching overhead if very few of those connections are fighting over any locks or spending much time in IO at all. This is atypical, so usually those other factors hit you first.

> The significance of the distinction depends entirely on the use case.

Indeed.

> Context switches are usually cheaper when you don’t need to throw out the old virtual address space (and invalidate the Translation Lookaside Buffer). Pthreads share virtual address space, so there is no need to flush the TLB.

I believe the cost of that has been reduced somewhat due to tagged TLBs on modern hardware.

> In a use case like Postgres, you don’t necessarily need to optimise for context switches. If you have a lot of concurrent connections, each of which has one process, then you’ll only hit limits with context switching overhead if very few of those connections are fighting over any locks or spending much time in IO at all. This is atypical, so usually those other factors hit you first.

Yea. There's a number of limitations in postgres due to the process model, but they're imo not TLB / context switch related. The biggest issue is that dynamically sharing memory between processes is harder, because there's no guarantee that it's possible for all post-fork memory allocations can portably be put at the same virtual addresses. Which then makes it more complicated to have shared datastructures, because you need to use relative pointers and such. That's not a problem for the main buffer pool etc, which is allocated when postgres is started, but it is problematic e.g. for memory shared between multiple processes working on the same query (say the memory for a shared hashtable in a hashjoin).

> you need to use relative pointers and such

I don't think this qualifies as a performance overhead, though, beyond the odd isub.

> > you need to use relative pointers and such

> I don't think this qualifies as a performance overhead, though, beyond the odd isub.

It ends up as one. The reason is less the additional instruction(s), but that you actually need to ferry arround additional data. In common scenarios you'll end up with a number of mappings shared between processes, so you can't just assume a single base address per-process. Instead you've to associate the specific mapping with relative pointers, and that does add to overhead. Both programming wise and runtime efficiency wise.