Hacker News new | ask | show | jobs
by cormacrelf 3146 days ago
The significance of the distinction depends entirely on the use case.

Yes, they’re both created with clone, but with different levels of sharing. A pthread will share the virtual address space of its parent, which makes shared memory simple to implement; use the same pointer and you’re done. CoW is not “sharing” really, because you can’t communicate over it, it just saves some creation overhead.

With CoW, technically nothing gets copied initially, but as soon as the new process starts executing, it’s going to start copying the stack frame and any other regions it’s using. With a pthread you can be certain it will just copy the stack.

Context switches are usually cheaper when you don’t need to throw out the old virtual address space (and invalidate the Translation Lookaside Buffer). Pthreads share virtual address space, so there is no need to flush the TLB.

In a use case like Postgres, you don’t necessarily need to optimise for context switches. If you have a lot of concurrent connections, each of which has one process, then you’ll only hit limits with context switching overhead if very few of those connections are fighting over any locks or spending much time in IO at all. This is atypical, so usually those other factors hit you first.

1 comments

> The significance of the distinction depends entirely on the use case.

Indeed.

> Context switches are usually cheaper when you don’t need to throw out the old virtual address space (and invalidate the Translation Lookaside Buffer). Pthreads share virtual address space, so there is no need to flush the TLB.

I believe the cost of that has been reduced somewhat due to tagged TLBs on modern hardware.

> In a use case like Postgres, you don’t necessarily need to optimise for context switches. If you have a lot of concurrent connections, each of which has one process, then you’ll only hit limits with context switching overhead if very few of those connections are fighting over any locks or spending much time in IO at all. This is atypical, so usually those other factors hit you first.

Yea. There's a number of limitations in postgres due to the process model, but they're imo not TLB / context switch related. The biggest issue is that dynamically sharing memory between processes is harder, because there's no guarantee that it's possible for all post-fork memory allocations can portably be put at the same virtual addresses. Which then makes it more complicated to have shared datastructures, because you need to use relative pointers and such. That's not a problem for the main buffer pool etc, which is allocated when postgres is started, but it is problematic e.g. for memory shared between multiple processes working on the same query (say the memory for a shared hashtable in a hashjoin).

> you need to use relative pointers and such

I don't think this qualifies as a performance overhead, though, beyond the odd isub.

> > you need to use relative pointers and such

> I don't think this qualifies as a performance overhead, though, beyond the odd isub.

It ends up as one. The reason is less the additional instruction(s), but that you actually need to ferry arround additional data. In common scenarios you'll end up with a number of mappings shared between processes, so you can't just assume a single base address per-process. Instead you've to associate the specific mapping with relative pointers, and that does add to overhead. Both programming wise and runtime efficiency wise.