Hacker News new | ask | show | jobs
by Acconut 3143 days ago
> It's for building specialized services such as key value stores, L7 proxies, static websites, etc.

First of all, thank you for publishing this project. It's very interesting in my opinion since I never thought about the benefits of an event loop. Would you mind explaining briefly why an event loop is a better suit for these applications? Is it due to performance and efficiency?

1 comments

I'd suggest that's not the right way to look at it. To a first approximation, "everything" is using an event loop nowadays, in that everything is using the same fundamental primitives to handle and dispatch events. In particular, this includes the Go runtime; run "strace" on a Go network program and you'll see these same calls pop up in the strace.

What this does instead is give a Go program direct access to the event loop. The benefit is that it bypasses all of the stuff that Go wraps around the internal event loop call that allows it to implement the way it offers a thread-like interface for you, and integrates with the channel and concurrency primitives, and maintains your position in the call stack between events, etc. The penalty is... the exact same thing, that you lose all the nice stuff that the Go runtime offers to you to implement the thread-like interface, etc., and are back to a lower-level interface that offers less services.

The performance of the Go runtime is "pretty good", especially by scripting language standards, but if you have sufficiently high performance requirements, you will not want to pay the overhead. The pathological case for all of these nice high-level abstractions is a server that handles a ton of network traffic of some sort and needs to do a little something to every request, maybe just a couple dozen cycle's worth of something, at which point paying what could be a few hundred cycles for all this runtime nice stuff that you're not using becomes a significant performance drain. Most people are not doing things where they can service a network request in a few dozen cycles, and the longer it takes to service a single request the more sense it makes to have a nice runtime layer providing you useful services, as it drops in the percentage of CPU time consumed by your program. For the most part, if you are so much as hitting a database over a network connection, even a local one, in your request, you've already greatly exceeded the amount of time you're paying to the runtime, for instance.

It does seem to me that a lot of people are a bit bedazzled by the top-level stuff that various languages offer, and forget that under the hood, everyone's using the event-based interfaces. What differs between Node and Twisted and all of the dozens or hundreds of other viable wrappers over these calls is the services automatically provided, not whether or not they are "event loops". Go is an event loop at the kernel level. Node is an event loop at the kernel level. Erlang is an event loop at the kernel level. They aren't all the same, but "event-based" vs. "not event-based" is not the distinction; it's a question of what they lay on top of the underlying event loop, not whether they use it. Even pure OS threads are, ultimately, event loops under the hood, just in the kernel rather than the user space.

> It does seem to me that a lot of people are a bit bedazzled by the top-level stuff that various languages offer, and forget that under the hood, everyone's using the event-based interfaces.

Yup. It's all very similar under the hood.

The most important difference between I/O models is whether the paradigm involves explicit vs. implicit management of the event loop. Callback models like Node, async/await style models like those of C#, and low-level primitives like IOCP, epoll, and kqueue fall into the former category. Go/Erlang, plain old threads, and even Unix processes fall into the latter category. There are advantages and disadvantages of each model.

Within each of these broad categories, the distinctions are, IMHO, much less interesting, and they're often made out to be more significant than they actually are. In particular, the distinction between runtimes like Go and regular OS pthreads is often made out to be more important than it really is, when the difference ultimately boils down to the CPU privilege level that thread management runs at.

Patrick, on the 2.6+ Linux kernels, is there a significant difference between threads and processes? It seems like both threads and processes are created via clone and the only difference is memory access?

I often hear "context switching between threads is cheaper" but pthreads still have their own PID and everything, so is this really the case?

Is there really much advantage to pthreads over the way PostgreSQL does things with efficient CoW sharing between processes for the binary?

The significance of the distinction depends entirely on the use case.

Yes, they’re both created with clone, but with different levels of sharing. A pthread will share the virtual address space of its parent, which makes shared memory simple to implement; use the same pointer and you’re done. CoW is not “sharing” really, because you can’t communicate over it, it just saves some creation overhead.

With CoW, technically nothing gets copied initially, but as soon as the new process starts executing, it’s going to start copying the stack frame and any other regions it’s using. With a pthread you can be certain it will just copy the stack.

Context switches are usually cheaper when you don’t need to throw out the old virtual address space (and invalidate the Translation Lookaside Buffer). Pthreads share virtual address space, so there is no need to flush the TLB.

In a use case like Postgres, you don’t necessarily need to optimise for context switches. If you have a lot of concurrent connections, each of which has one process, then you’ll only hit limits with context switching overhead if very few of those connections are fighting over any locks or spending much time in IO at all. This is atypical, so usually those other factors hit you first.

> The significance of the distinction depends entirely on the use case.

Indeed.

> Context switches are usually cheaper when you don’t need to throw out the old virtual address space (and invalidate the Translation Lookaside Buffer). Pthreads share virtual address space, so there is no need to flush the TLB.

I believe the cost of that has been reduced somewhat due to tagged TLBs on modern hardware.

> In a use case like Postgres, you don’t necessarily need to optimise for context switches. If you have a lot of concurrent connections, each of which has one process, then you’ll only hit limits with context switching overhead if very few of those connections are fighting over any locks or spending much time in IO at all. This is atypical, so usually those other factors hit you first.

Yea. There's a number of limitations in postgres due to the process model, but they're imo not TLB / context switch related. The biggest issue is that dynamically sharing memory between processes is harder, because there's no guarantee that it's possible for all post-fork memory allocations can portably be put at the same virtual addresses. Which then makes it more complicated to have shared datastructures, because you need to use relative pointers and such. That's not a problem for the main buffer pool etc, which is allocated when postgres is started, but it is problematic e.g. for memory shared between multiple processes working on the same query (say the memory for a shared hashtable in a hashjoin).

> you need to use relative pointers and such

I don't think this qualifies as a performance overhead, though, beyond the odd isub.